This is an open-access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR Medical Informatics, is properly cited. The complete bibliographic information, a link to the original publication on http://medinform.jmir.org/, as well as this copyright and license information must be included.
There are several frameworks that attempt to address the challenges of evaluation of health information systems by offering models, methods, and guidelines about what to evaluate, how to evaluate, and how to report the evaluation results. Model-based evaluation frameworks usually suggest universally applicable evaluation aspects but do not consider case-specific aspects. On the other hand, evaluation frameworks that are case specific, by eliciting user requirements, limit their output to the evaluation aspects suggested by the users in the early phases of system development. In addition, these case-specific approaches extract different sets of evaluation aspects from each case, making it challenging to collectively compare, unify, or aggregate the evaluation of a set of heterogeneous health information systems.
The aim of this paper is to find a method capable of suggesting evaluation aspects for a set of one or more health information systems—whether similar or heterogeneous—by organizing, unifying, and aggregating the quality attributes extracted from those systems and from an external evaluation framework.
On the basis of the available literature in semantic networks and ontologies, a method (called Unified eValuation using Ontology; UVON) was developed that can organize, unify, and aggregate the quality attributes of several health information systems into a tree-style ontology structure. The method was extended to integrate its generated ontology with the evaluation aspects suggested by model-based evaluation frameworks. An approach was developed to extract evaluation aspects from the ontology that also considers evaluation case practicalities such as the maximum number of evaluation aspects to be measured or their required degree of specificity. The method was applied and tested in Future Internet Social and Technological Alignment Research (FI-STAR), a project of 7 cloud-based eHealth applications that were developed and deployed across European Union countries.
The relevance of the evaluation aspects created by the UVON method for the FI-STAR project was validated by the corresponding stakeholders of each case. These evaluation aspects were extracted from a UVON-generated ontology structure that reflects both the internally declared required quality attributes in the 7 eHealth applications of the FI-STAR project and the evaluation aspects recommended by the Model for ASsessment of Telemedicine applications (MAST) evaluation framework. The extracted evaluation aspects were used to create questionnaires (for the corresponding patients and health professionals) to evaluate each individual case and the whole of the FI-STAR project.
The UVON method can provide a relevant set of evaluation aspects for a heterogeneous set of health information systems by organizing, unifying, and aggregating the quality attributes through ontological structures. Those quality attributes can be either suggested by evaluation models or elicited from the stakeholders of those systems in the form of system requirements. The method continues to be systematic, context sensitive, and relevant across a heterogeneous set of health information systems.
In one aspect at least, the evaluation of health information systems matches well with their implementation: they both fail very often [
Standing as a cornerstone for evaluation is our interpretation of what things constitute success in health information systems. A body of literature has developed concerning the definition and criteria of a successful health technology, in which the criteria for success go beyond the functionalities of the system [
To map the definition of success of health information systems onto real-world cases, certain evaluation frameworks have emerged [
Evaluation frameworks offer a wide range of components for designing, implementing, and reporting an evaluation, among which are suggestions or guidelines for finding out the answer to
To identify evaluation aspects, evaluation frameworks can take two approaches: top down or bottom up. Frameworks that take a top-down approach try to specify the evaluation aspects through instantiating a model in the context of an evaluation case. Frameworks that focus on finding, selecting, and aggregating evaluation aspects through interacting with users, that is, so-called user-centered frameworks, take a bottom-up approach.
In the model-based category, TAM and TAM2 have wide application in different disciplines including health care [
Some model-based frameworks extend further by taking into consideration the relations between the elements in the model. The Fit between Individuals, Task and Technology model includes the
Outcome-based evaluation models, such as the Health IT Evaluation Toolkit provided by the Agency for Healthcare Research and Quality, consider very specific evaluation measures for evaluation. For example, in the previously mentioned toolkit, measures are grouped in domains, such as
In contrast to model-based approaches, bottom-up approaches are less detailed on about the evaluation aspects landscape; instead, they form this landscape by what they elicit from stakeholders. Requirement engineering, as a practice in system engineering and software engineering disciplines, is expected to capture and document, in a systematic way, user needs for a to-be-produced system [
The advantages of elicitation-based approaches, such as requirement engineering, result from an ability to directly reflect the case-specific user needs in terms of functionalities and qualities. Elicitation-based approaches enumerate and detail the aspects that need to be evaluated, all from the user perspective. Evaluation aspects that are specified through the requirement engineering process can be dynamically added, removed, or changed due to additional interaction with users or other stakeholders at any time. The adjustments made, such as getting more detailed or more generic, are the result of new findings and insights, new priorities, or the limitations that arise in the implementation of the evaluation.
The advantages in the requirement engineering approach come at a cost of certain limitations compared with model-based methods. Most of the requirement elicitation activities are accomplished in the early stages of system development, when the users do not have a clear image of what they want or do not want in the final system [
Being case-specific by using requirement engineering processes has a side effect: the different sets of evaluation aspects elicited from each case, which can even be mutually heterogeneous. Model-based approaches might perform more uniformly in this regard, as they try to enumerate and unify the possible evaluation aspects through their models imposing a kind of unification from the beginning. However, there still exists a group of studies asking for measures to reduce the heterogeneity of evaluation aspects in these approaches [
Heterogeneity makes evaluation of multiple cases or aggregation of individual evaluations a challenge. In a normative evaluation, comparability is the cornerstone of evaluation [
In health technology, the challenge of heterogeneity for comparing and evaluation can be more intense. The health technology assessment literature applies a very inclusive definition of
By extracting the lowest common denominators from among evaluation subjects, thereby creating a uniform context for comparison and evaluation, we can tackle the challenge of heterogeneity via elicitation-based evaluation approaches. Vice versa, the evaluation aspects in an evaluation framework suggest the common denominators between different elements. The lowest common denominator, as its mathematical concept suggests, expands to include elements from all parties, where the expansion has been kept to the lowest possible degree.
Usually, there are tradeoffs and challenges around the universality of an evaluation aspect related to how common it is and its relativeness (ie, how low and close to the original elements it lies). When the scopes differ, their nonoverlapped areas might be considerable, making it a challenge to find the common evaluation aspects. Furthermore, the same concepts might be perceived or presented differently by different stakeholders [
It is possible to merge the results of model-centered and elicitation-centered approaches. The merged output provides the advantages of both approaches while allowing the approaches to mutually cover for some of their challenges and shortcomings.
The aim of this paper is to address the question of
The structure of the rest of this paper is as follows. The research method that resulted in the UVON method is described in Methods section. The result, that is, the UVON method, is covered in The UVON Method for Unifying the Evaluation Aspects section, whereas its application in the context project is covered in Result of the UVON Method Application in the FI-STAR Project section. The rationale behind the method is discussed in Discussion section and the possible extensions and limitations are found in Extending the Evaluation Using the Ontology and Limitations of the UVON Method sections. The Conclusions section summarizes the conclusions of the paper.
The FI-STAR project is a pilot project in eHealth systems funded by the European Union (EU). The evaluation of the FI-STAR project has been the major motive, the empirical basis, and the test bed for our proposed evaluation method, that is, the UVON method (to be described in Results section). FI-STAR is a project within the Future Internet Public-Private Partnership Programme (FI-PPP) and relates to the Future Internet (FI) series of technology platforms. The project consists of 7 different eHealth cloud-based applications being developed and deployed in 7 pilots across Europe. Each of these applications serves a different community of patients and health professionals [
A general review of the existing evaluation frameworks was done. Existing model-based evaluation frameworks, which usually suggest universal quality attributes for evaluation, could not cover all the quality attributes (ie, evaluation aspects) reflected by the requirement documents of the pilot projects in FI-STAR. Even if there was a good coverage of the demanded evaluation aspects, there was still no guarantee that they could maintain the same degree of good coverage for the future expansions of the FI-STAR project. On the other hand, the requirement documents from the FI-STAR project were not expected to be the ultimate sources for identifying those quality attributes. It was speculated that there could exist other relevant quality attributes that were captured in the related literature or embedded in other, mostly model-based, health information system evaluation frameworks. For these reasons, it was decided to combine quality attributes both from the FI-STAR sources and a relevant external evaluation framework. To find other relevant evaluation aspects, a more specific review of the current literature was performed that was more focused on finding an evaluation framework of health information systems that sufficiently matched the specifications of the FI-STAR project. The review considered the MAST framework [
Regarding the heterogeneity of FI-STAR’s 7 pilot projects, an evaluation mechanism was needed to extract common qualities from different requirement declarations and unify them. A review of the related literature showed that the literature on ontologies refers to the same functionalities, that is, capturing the concepts (quality attributes in our case) and their relations in a domain [
A method was developed to organize and unify the captured quality attributes via requirement engineering into a tree-style ontology structure and to integrate that structure with the recommended evaluation aspects from another evaluation framework. The method was applied for the 7 pilots of the FI-STAR project, which resulted in a tree-style ontology of the quality attributes mentioned in the project requirement documents and the MAST evaluation framework. The top 10 nodes of the tree-style ontology were chosen as the 10 aspects of evaluation relevant to the FI-STAR project and its pilot cases.
Methodical capture of a local ontology [
The ontology structure, in its tree form, is the backbone of the UVON method. Modern ontology definition languages can show different types of relations, but for the sake of our method here, we only use the
An example snapshot of the output ontology while running the UVON method.
The UVON method is composed of 3 phases: α, β, and γ (
The β ontology construction begins with a special initial node (ie, quality attribute) that is called
The first quality attribute simply needs to add itself as the child of the
The journey ends at some point because of the following situations: If there is no child for a new root quality attribute (Q_n), then the traveling quality attribute (Q_t) should be added as a child to this one and its journey ends. That is the same if there exist children to a new root quality attribute (Q_n), but any of them is neither a superclass nor a subclass to our traveling quality attribute. Beside these two situations, it is possible that no child is a superclass, but one or more of them are the subclass of the traveling quality attribute (Q_t). In this situation, the traveling quality attribute (Q_t) itself becomes a child of that new root quality attribute, and those child quality attributes move down to become children of the traveling quality attribute (Q_t).
To keep the ontology as a tree, if a traveling quality attribute (Q_t) finds more than one superclass child of itself in a given situation, then it should replicate (fork) itself into instances, as many as the number of those children, and go through each branch separately. It is important to note that, logically, this replication cannot happen over two disjoint (mutually exclusive) branches. It is also possible to inject new quality attributes in between a parent node and children, but only if it does not break subclass or superclass relations. This injection can help to create ontologies in which the nodes at each level of the tree have a similar degree of generality, and each branch of the tree grows from generic nodes to more specific ones.
This customized depth-first tree traversal algorithm, which actually constructs a tree-style ontology instead of just traversing one, is considered semiautomated, as it relies on human decision in two cases. The first case is when it is needed to consider the superclass to subclass relations between two quality attributes. The gradual development of the ontology through the UVON method spreads the decision about superclass to subclass relations across the course of ontology construction. The unification of heterogeneous quality attributes (nodes) is the result of accumulating these distributed decisions, which are embodied as superclass to subclass relations. Each of these relations (ie, decisions) makes at least 2 separate quality attributes closer together by representing them through more generic quality attributes.
In addition, one can inject a new quality attribute to the ontology tree, although that quality attribute is not explicitly mentioned in the requirement documents. This injection is only allowed when that quality attribute summarizes or equals a single or a few sibling quality attributes that are already in the ontology. The injection can improve clarity of the ontology. It can also help adjust the branches of the ontology tree to grow to a certain height, which can be helpful when a specific level of the tree is going to be considered as the base for creating a questionnaire. This adjustment of branch height might be needed if a branch is not tall enough to reach a specific level, meaning none of the quality attributes in that branch gets presented in the questionnaire. In addition, if a quality attribute is very specific compared with other quality attributes in that level of the tree, the questions in the questionnaire become inconsistent in their degree of generality. This inconsistency can be handled by injecting more generic quality attributes above the existing leaf node in the branch. All the previously mentioned benefits come with the cost of subjectivity in introducing a new quality attribute.
The γ phase ontology is constructed the same as the β phase, but it adds materials (quality attributes) from external sources. In this sense, the quality attributes specified in an external evaluation framework, probably a model-based one, should be extracted first. Those quality attributes should be fed into the β ontology the same as other quality attributes during the β phase. The UVON method does not discriminate between quality attribute by the origin, but it might be a good practice to mark those quality attributes originally from the external evaluation framework if we need later to make sure they are used by their original names in the summarizing level (to be discussed in the following paragraphs).
Each level of the resulting ontology tree(s)—except those that are deeper than the length of the shortest branch—represents or summarizes quality attributes of the whole system in some degree of generality or specificity. That of the
Ontology construction for a health information system.
The quality attributes in each of the other levels (such as L_1 in
More details can be evaluated by looking at deeper nodes in the ontology structure.
Harvesting the value-cases and requirement documents for all 7 trial-cases in the FI-STAR project provided the initial set of quality attributes, that is, the α set. Several quality attributes were redundant or similar, but it was left to the UVON method to unify them. There were also several quality attributes with the same wording but different conceptual indications in their respective usage contexts. These quality attributes we added to the α set with small modifications to differentiate them from each other. For example, 2 different references to
In the next step, that is, β phase, the UVON method developed β ontology by using the α set. The redundant quality attributes were integrated into single entities, whereas other quality attributes were grouped by their direct or indirect parents in the ontology structure regarding their degree of similarity or dissimilarity.
In addition, it was noticed that quality attributes are preferred—although not necessarily always—to be noun phrases rather than adjective phrases; this is because fulfilling a quality attribute expressed in an adjective phrase could imply that all of its child quality attributes need to be fulfilled. For example, to fulfill the quality of being
Applying the UVON method in its β and γ phases, respectively, created the β and γ ontology structures (γ in
The MAST framework specifies 7 evaluation domains, where each contains several topics (aspects or sub-aspects) [
Both the β and γ ontology structures were described in Web Ontology Language (OWL) using Protégé version 4.x software. OWL, as an ontology language, can describe a domain of knowledge through its lingual elements and their relations [
The mapping between MAST evaluation aspects and the final evaluation aspects for the FI-STAR project using UVON.
MAST | Final top aspect | |
Domains |
Aspects | |
a | ||
Clinical safety (patients and staff) | Safety | |
Technical safety (technical reliability) | Safety | |
b | ||
Effects on mortality | b | |
Effects on morbidity | b | |
Effects on health-related quality of life (HRQL) | b | |
Behavioral outcomes | b(but can relate to adhereability) | |
Usage of health services | b(but can relate to adhereability) | |
Satisfaction and acceptance | c | |
Understanding of information | Accessibility | |
Confidence in the treatment | Trustability and authenticity | |
Ability to use the application | Accessibility | |
Access and accessibility | Accessibility | |
Empowerment, self-efficacy | Empowerment | |
Amount of resources used when delivering the application and comparators | Efficiency | |
Prices for each resource | Efficiency | |
Related changes in use of health care | a | |
Clinical effectiveness | b | |
Expenditures per year | Affordability | |
Revenue per year | b | |
Process | a(but can relate to efficiency) | |
Structure | a | |
Culture | a | |
b |
aNot a quality attribute.
bNot included because of the FI-STAR project definition and division of tasks.
cHad been already covered by some generic questions in the output questionnaire.
Some generic nodes were inserted to group sibling nodes that were conceptually closer together in the ontology structure. If a quality attribute was connected to 2 different branches, it was forked and presented in the both branches (as described before); that keeps the ontology in a tree structure rather than an acyclic directed graph.
Applying the UVON method in the FI-STAR project case, at the end of the γ phase, 10 nodes appeared below the root of the ontology tree (
Quality name
Accessibility
Adhereability
Affordability
Authenticity
Availability
Efficiency
Effectiveness
Empowerment
Safety
Trustability
In the FI-STAR project, the measurement of evaluation aspects was performed through a questionnaire based on those 10 extracted aspects in the γ ontology. Two versions of the questionnaire had been created: one for the patients and one for the health professionals, where each expressed the same concept in 2 different wordings (Note: one operation theatre case did not have patient questionnaire).
Generally and regarding practicalities of an evaluation case, it is possible to consider deeper levels of the resulting γ ontology in a given case. In the FI-STAR case, this possibility is reflected in a sample question on
In the FI-STAR project, the quality attributes (and later the questionnaires) were delivered to each case’s stakeholders, who were asked to validate the relevancy of each quality attribute or the corresponding question regarding their case. All the cases in the FI-STAR project validated and approved their relevancy, whereas some asked for minor changes in the wordings of some of the questions to be clearer for the patient respondents in their case.
Sample questionnaire output from the UVON method.
Ontologies are formal and computable ways of capturing knowledge in a domain—whether local or global [
An ontology would be formed as a hierarchy if the relations between the concepts are limited to the
Ontologies are traditionally the output of manual content curation and its associated consensus-establishment processes [
The ontological representation of a health information system gives a computable structure from which several indications, including evaluation aspects, can be extracted. Functions can be defined on this ontology that quantify, combine, compare, or select some of the nodes or branches. The ontology itself can be extended by assigning values to its nodes and edges, giving the possibility of further inferences. For example, if 2 nodes (quality attributes) are disjoint (mutually exclusive), any 2 children from each of them would be disjoint, respectively. If during the application of the UVON method, by mistake, one quality attribute were replicated into 2 disjoint branches, then this mistake can be detected and avoided automatically (replication would be disallowed between those specific nodes).
As discussed in “Result of the UVON Method Application in the FI-STAR Project” section and shown in
In addition, the selection of the MAST framework was due to its common themes with the eHealth applications in the FI-STAR project. We encourage application of the UVON method by considering other relevant evaluation frameworks, not necessarily MAST. The results of those applications can demonstrate the powers, weaknesses, and extension points of the FI-STAR method.
The UVON method is context-insensitive in its approach. Still, more empirical evidence, with a higher degree of diversity, is needed to examine what the challenges or advantages of applying the UVON method are in a more diverse range of fields beyond health information systems.
The UVON method is subject to conceptual and methodological limitations in its capacities. Probably, a prominent conceptual limitation is the fact that the method does not represent or give an account of the dynamics of the health information systems; hence, it cannot facilitate their evaluation. The relations in the UVON-constructed ontologies are restricted to the
The UVON method partially relies on subjective decision-making, which can create methodological limitations and challenges. Although the main strategy in the UVON method is to minimize these subjective decisions, the existing ones can still result in creating different ontologies in different applications of the method. As a suggestion, for the sake of reaching more convergence, it is possible to think of enhancing the method with more objective lexical analytical methods. Methods of ontology construction and integration, especially those concerning class inheritance analysis [
UVON-generated ontologies are not advised for universal application. However, for a new case of evaluation, a UVON-generated ontology that was developed for similar cases can be considered as an alternative to developing a new ontology with consideration to project resource limitations. This reuse should be accomplished with due consideration to the fact that quality attributes of the same wording might indicate slightly different meanings in different cases. This case-sensitivity of meanings might result in different subclass and superclass relations, changing the structure of the ontology and making the reuse of the unadjusted ontology problematic.
The UVON method cannot guarantee that in the output ontology each of the branches that begin from the root will reach the level of the tree (that is, have a node at that level) where we want to base our questionnaire (or any other measurement method). Hence, a short branch might need to be extended to appear at some specific tree level where the questionnaire is based. In addition, the method does not guarantee that the quality attributes in that level are all of the same degree of generality of specificity. It is also not guaranteed that the number of nodes (quality attributes) at any level matches the practicalities of evaluation; there can be too few or too many. For example, in the FI-STAR case, the number of quality attributes in the target level (level 2) had to match with the appropriate maximum number of questions that could be put in a questionnaire; fortunately, it was within the boundaries.
It is also possible, at least in theory, that all quality attributes end up being a direct child of the root
The UVON method permits integrating evaluation aspects from other evaluation frameworks. Still, it does not guarantee that the result will include all features of the integrated evaluation framework. Still, this integration involves the suggested evaluation aspects of those evaluation frameworks. If a framework dynamically changes its suggested evaluation aspects, for example, based on the evaluation case specifications, the UVON does not follow that dynamic feature. In addition, the straightforward wordings for an evaluation aspect in an evaluation framework might be obscured by going through the integration process in the UVON method, being replaced by more generic terms.
The unifying nature of ontologies, when they are in tree form, can be used to create a common ground of evaluation for heterogeneous health technologies. Ontologies can be originated from requirement and value-case documents, that is, internal; they can be extracted from available external evaluation frameworks, that is, external; or they can be originated from a mix of both internal and external sources. The UVON method introduced in this paper was able to create a common ground for evaluation by creating an ontology from requirement and value-case documents of the 7 trial projects in the FI-STAR project and extend that ontology by mixing elements from the MAST evaluation framework. The UVON method can be used in other, similar cases to create ontologies for evaluation and to mix them with elements from other evaluation frameworks.
The UVON method stands in contrast with other methods that do not consider case-specific internal requirements or cannot be easily extended to include other evaluation frameworks. The ontological structure of evaluation aspects created by the UVON method offers the possibility of further investigations for other indications related to evaluation of the subject systems.
The final result of applying the UVON method in the FI-STAR project resulted in 10 evaluation aspects to be chosen for measurement. This set of evaluation aspects can grow adaptively to project changes, be repeated in similar cases, and be a starting point for future evaluations in similar projects. By applying the UVON method in more cases, a possible stable result can be suggested for the set of generic evaluation aspects that are usable in evaluation cases similar to FI-STAR.
UVON-generated Ontology (in OWL) for the FI-STAR project.
European Union
Future Internet
Future Internet Social and Technological Alignment Research
Future Internet Public-Private Partnership Programme
Fit between Individuals, Task and Technology
Human, Organization, and Technology Fit
International Network of Agencies for Health Technology Assessment
Model for Assessment of Telemedicine applications
Web Ontology Language
Statement on the Reporting of Evaluation studies in Health Informatics
Technology Acceptance Model
Technology Acceptance Model 2
Unified Theory of Acceptance and Use of Technology
Unified eValuation using Ontology
The authors would like to acknowledge the contribution of project partners from the FI-STAR project for providing the context of this study. The FI-STAR project is funded by the European Commission under the Seventh Framework Programme (FP7), under grant agreement number 604691.
Regarding the contributions, SE drafted the paper, incorporated contributions from other authors into the paper, contributed to the design of the study, developed the proposed model and method, processed data for β and γ phases of the proposed method, and contributed to the proposed method final result. PA contributed to the design of the study, contributed to the proposed method final result, supervised the research process, and reviewed and commented on the paper. TL contributed to the design of the study, supervised the research process, and reviewed and commented on the paper. SF collected data for the α phase of the proposed method and reviewed and commented on the paper. JB contributed to the design of the study, supervised the research process, and reviewed the paper.
None declared.