Medical Informatics I: Principles of Database Design

Article Index
Medical Informatics I: Principles of Database Design
Page 2
Page 3
Page 4
All Pages

Page 3 of 4

Transcript Part III: Data Models:

One of the fundamental assertions of this class and I have already said it but will say it again is that the data model is really the most critical aspect of system design and function. That is where you should begin when you are in the business of either working with a vendor, or designing your own new system. Data models, to endure, should really reflect the real world objects that they describe and the relationships between those objects. And I will show you more of what I mean by that in a little bit. A correct data model (as well as an incorrect one) subserves and outlasts the original application and many not anticipated at systems start up. The object is not the instance. The object is the abstraction of all members of that class, all objects that share those same characteristics. And we group those objects by criteria that represent the abstract as an empty table. This is a relational model here, but actually you can model other underlying databases the same way. So, this is not everything that defines a dog, but for purposes of our system it contains the characteristics that are of interest to us. We have a dog name, a dog breed, its favorite food and its birth date, as an example of that set of abstracted characteristics that are meaningful to us for purposes of whatever system we are designing.

Those empty tables then can be filled in with "instances"--in computer science the two dollar word for this is "instantiation". And an "instantiation" just means we have created a new row that has actual values for each of the classes--the categories of things that are in the empty table format.

So you can see some of the examples, there. This model can be used to represent not only objects but their relationships in the real world, and events and things that happen in time and places. So the relationships between objects are themselves "attributes", that is, characteristics of the objects. And those attributes themselves can be modeled in relational tables.

Here is an example of a relationship where there is a dog table and each dog has an owner name. And that hence the relationship to another table, where this is a human being who has the relationship of owning the dog. So we have linked data from two different tables by virtue of common occurrence of the column in both of those table structures.

The important rules about which one does this, are that all of the real world things in the set; that is, the instances, the rows as you create them and fill in the table have to have all the same characteristics. This is a common problem in database design, that you could have a model that looks absolutely sensible, but fails.

Here is a table called license, and it has characteristics of the license number, the expiration date, the manufacturer the model. Here is a row that is an instance of the car that sits very nicely, but when we try to put FiFi in here, which is a different kind of object from the real world, even though it may have a license, these characteristics don't match. So what happens...you either get holes in the table because that characteristic simply doesn't apply...you are trying to store two different kinds of objects in the same table design...or you get very, very strange values. And what happens then, is you are writing computer programs to look at the values and decide what am I going to do with this one. Do I do something different with it than with the other one?

This seems ludicrous but I will tell you it happens more often than you can imagine, in what people call, "data model creep", where someone creates the first data model, looks pretty good, and everything fits together. And the next person comes along and says, "But we need to add a new field. Where shall we stick it?" And they often violate the sensibility of the object definitions implicit in the original table design.

So all instances have to conform to the same rules. There are a number of different types of objects. They represent tangible things like books and dogs and such. They also can used to represent relationship and role. So one, maybe a human being that has multiple roles. I am a doctor with respect to this person. I am a credit card holder with respect to a credit organization. I am a borrower with respect to the library. So the notion of "roles" joining these physical objects--the things they do in the world--are one of the underlying powers of these relational models.

We can also use these tables to represent things that occur in time and space--incidents or events such as the ordering of a lab test. We can also bind tables together with interactions. For example, we could have a table called "purchase" that binds a buyer to a seller, and perhaps to an object that has been bought or sold. And we can even use tables, and most commonly do, with what are called "computer-assisted software engineering tools"-- to define the content of the system, so that you have data-driven systems, where the data dictionary controls the behaviors of how the database system works.

This is a common notation...there are three major alternative forms. So if you are in the business of designing a new system, working with a vendor and you are sitting down to lay out the data model, you will see it commonly in three different ways.

The first is an empty table form, where you have the name of the table in a kind of column that runs across the top. And then you have the individual empty columns names, a graphical form where the table names the first line, and the fields that go into essentially the column names, but they become the fields of the records, are listed vertically. One that has an asterisk means that the foreign key reference, that is a data value that exists in some other linked table whenever you see an asterisk in this design.

And then there is a simple text form where you have the name of the table and in parentheses a comma separated list of the fields or the attributes of that table, and again, if you are in this design process with a vendor, they will often send you these kinds of things to get your approval for the nature of your design.

E. F. Codd came up with a set of rules by which you would know whether one of these systems is well designed or not. And they actually extend out through third and fourth normal forms, but we are only going to do the top level here because the complexities get more and more subtle.

The first rule on how you design fully normalized relational models, and normalized, in a sense, means a correct data model. One instance of an object has exactly one value for each attribute that is at each intersection of a row and a column. There is only one value. It is always there, never missing and there are no repeated groups or embedded structure.

Rule number two is that attributes must not have any internal structures. So this is a mistake. For convenience sake you might say, "Well I just want to store the name and the age and the sex." So you create this data formatting where you are going to have the age-sex. The problem is, those are two independent attributes. One changes and one doesn't and now we have to be able to take them apart for purposes for example of looking with one system that may only be interested in one, the gender, and the other system is only interested in the age.

So now we have an internal processing rule by which we have to take the data apart because we inappropriately squashed it together. We put it in one field when it really should have been two separate fields or columns.

If you obey these rules...that each instance has exactly one value for each attribute...and there is no internal structure...you are about a third of a way there for a good design. That is, you are in first normal form by the E. F. Codd rules.

Rule three extends that, to say that every attribute should represent a characteristic of the entire object, and not a characteristic of a limited part of the object. Now, that is a little more subtle, but here is an example that shows you what they mean by that.

It would be absolutely sensible...if your job was to create a little database that had the members of committees in your hospital, and you would say, "Well, I think I want to store the person's name, and I want to store the name of the committee they are on. And I want to store the date that their term expires on their appointment for that committee, and I also need to know the date that they first joined the hospital staff."

Well that may be sensible from your particular view of the data, but this is a mistake because the date you joined the hospital staff is not an attribute of your membership of your committee. That fact of the real world, that aspect of the object of that person, if you will, persists independent of whether it is stored in your data base, and it is not an attribute of being somehow of your committee membership.

So that the correct way to do that, is to store just those things that relate to the committee membership and then store a foreign key reference to, for example, the personnel table, where you get the date that somebody was first employed. So you make that a relationship, a foreign key rather than storing into your table.

Because what happens when you store it in your table and an error is discovered because you are not the gold standard for that information? You have the classic one that we have hundreds of places where we store phone numbers in databases all through the organization. Well, phone numbers change all the time, and nobody knows all of the locations where each number is stored. Part of the data gets updated and part of it does not, so this embedding of items that are not aspects of the object has a very insidious problem of data asynchrony and update.

Relationships are the abstraction of a set of associations that hold, again, systematically. That is, this is up at the level of our theoretically pure object. Not the actual instances necessarily, but when we define an object in the world, we create relationships that hold systemically between all of those classes.

So for example, a patient occupies a bed, a library contains books, a specimen is assayed. Often we use these inverse relationships, and we often pair them so we have bi-directional relationships...you can work that logic in both directions. And most relationships by and large can be stated in an inverse, though it isn't always the case that both of these are created in a database system. But if the library lends a book, one can state it in a kind of passive voice that the book is lent by the library and those might represent actually two different relationships, depending upon the data model.

Relationship Types:

There are a variety of relationship types. The simplest one is the one-to-one relationship. Each state has one, and only one, governor, and each governor governs one and only one state. When you see this, you have to wonder why are we putting these in separate tables. Because if it really is a one-to-one relationship and it always holds, then one might imagine those are actually part of a single abstraction, but there might be good reasons... For example, you might have a table of governors of all states, and might have other characteristics of the state that you store which have nothing to do with the characteristics of the government. One-to-many relationships are shown in these data models. I should say the one-to-one relationship has a single arrow on both ends. That single arrow means there is one occurrence on both sides. A one-to-many relationship has a double arrow, meaning that this, as a record, may have multiple occurrences of this related record. So one dog owner may own many dogs but each dog can have only one owner, at least in this model.

There are however, even more complicated relationships such as the relationships between authors and books. One author may write many books, but each book may have many authors, is written by many authors. And in this case, the data model shows double arrows on both ends of that relationship.

And modeling many-to-many complexities becomes one of the more challenging aspects of relational database design. In fact, the way that one generally has to do this, is to insert yet a third table because of à priori, there is no way, if you have many-to-many relationships...it is hard to establish which records in this table are related to which records in this table. So, by and large, in a relational system, in order to model a many-to-many relationship, you have to create a linking file, which has...the nature of which records are related in those two other files.

Here is an example of a many-to-many relationship, where one drug manufacturer may manufacture many, many drugs and one generic drug may be manufactured by many different manufacturers. So that, what binds those together is a linking table of the license to manufacture in a kind of FDA sense, where we have the manufacturer name, which is a foreign key reference to this table, the generic name...and then the characteristics of this license, for example, the date it was issued. That is how you establish, from these many-to-many relationships, just the ones that apply between the two related tables.

Entity Relationship Diagram:

The overall system design process has escalating layers depending upon how many people are involved in it and how big the system is. For all of the databases that you would build including the ones that you just start on your own desktop, and the ones you are going to fully own, and nobody else is ever going to use it is very much recommended that you begin with a paper and pencil exercise or...we are going to do it with Microsoft Word. We will show you how you can use that as a quick computer assisted software engineering tool to create what is called the entity relationship diagram.
The "entity" or the "object" the relationships are the things that link them together. And what is going to become your table? Don't start, even in Microsoft Access, just naming fields, because you will get in a thicket. In square brackets we have the things that are applicable for multi programmer projects. What a vendor will often do, is create a thing called a " state transition model". And that is to look at the data that is filled into those tables. Under what conditions does it change? Who is allowed to create it? Who is allowed to edit it? What processes have access to it? Those state transition models describe all of the possible values, in essence, that the data can take in the system. And some...although these tend to be reserved for more complicated...some of them take "the day in the life of a data item" view. A flow diagram that says, "Okay, if once an item is created, how does it make its way through the system to various reports, screens, users, analysis routines?" And that is the kind of data flow view that looks at every single object in the tables and decides how it changes over time, and as it moves through an organization. Because physically it may move from one database to another.

<< Prev - Next >>

< Prev		Next >

Mediformatica - The Medical Informatics Portal

Latest Blog Articles

Medical Informatics I: Principles of Database Design - Page 3

Software

Healthcare IT in Egypt

BLOG

VIDEOS

Who's Online

Counter

Medical Informatics

Hospital Information Systems

Videos

Mediformatica's Blog