If you don’t understand the relational nature of data, you won’t understand its value. Contextualising data is an important step towards gaining a complete understanding of your information. One way to do this is through data modelling. Although the process is very technical, all business stakeholders must be involved in the process (not just the technical teams). In this guest post, Hardus Swanepoel, Solutions Architect at DotModus, presents a brief overview of data modelling and its benefits.
What is Data Modelling?
At a very basic level, data modelling is the process of creating simple visual representations of a complex information system and how it relates to different data points and structures. The ultimate goal is to represent the type of data used and stored, the relationships between data, and the rules. Typically, diagrams, symbols and text are all used to represent data and illustrate the connections.
Why use Data Modelling?
With data modelling, all stakeholders in a business can see and understand what data is available, what has been defined and what is being requested. This will prevent duplication of tasks or data and allows for instant data identification of business requirements. Should a new requirement come from business the delivery team would be able to instantly identify if the data is already available or not, if it is available, they will know if they’ve already built reporting on it, ie. how mature is the data in use. In the long run, this will reduce costs and errors in development while improving database and application performance.
What are Data Models?
The data modelling process assists with defining which data should be identified and provides a framework to group these data elements according to logical application thereof. There are three data model categories:
- Conceptual data models
The data requirements as defined by the business stakeholders. This is still structureless data and is based on business requirements.
- Logical data models
In this model, the logical application of the conceptual data models is defined. These data models should have schemas, data attributes and the relationships that define the interactions between the data sets.
- Physical data models
These are the physical implementation of the designs defined by the logical data models in the warehouse. The tables would have schemas with primary keys, foreign keys, column data type and constraints and so forth.
The conceptual data models are abstract representations. They’re high-level diagrams that are used to define and present data elements and their relationships to each other. These models don’t contain technical details, such as attributes, data types, etc.
Conceptual data models are used when defining the business requirements of the database and presenting concepts. These models aren’t used to communicate with technical teams as they are very simple.
Let’s take users for a web-based retail application, for example; at a high level, this data set will have the following entities:
Based on this feature set, the conceptual model could be presented as per the below image. These models are akin to a whiteboard discovery session to help define the outline of the picture we are trying to create.
Logical data models expand on the conceptual data modes, with the primary aim to apply normalisation techniques to reach the third normal form. This will allow data storage that avoids duplication, promotes referential integrity, and reduces the risk of incorrect or missing data.
Tables could be broken down into multiple tables, if required, during the normalisation process. Once normalisation has been applied to define the attributes of each entity, relationships between the models are created defining how data will interact between the tables. The entities have now become tables and will be referred to as such.
When a table is normalised, the attributes and data types of each column should be clearly defined. Primary keys should also be identified and documented at this stage. Below is the logical model for the tables based on the normalisation applied to the entities identified in the conceptual model.
Physical data models are derived from the logical data models for a particular warehouse system or RDBMS. These models take into account the specific technical details of the product. At this stage in the modelling process, we’re applying the actual table and column names as we would use them in the warehouse, rather than specifying entity and attribute names. Doing this allows the models to be created within the conventions and limitations of the warehouse product that will be used for implementation.
Data types and constraints are also provided with these models. A physical model should be easily transformable into SQL scripts to generate the objects in the warehouse. Below is an example of the physical model derived from the logical model.
Data Modelling Methodologies
So how do we approach Data Modeling? The two proven methodologies that can be followed are:
This process works from the existing physical data models that are already in place, then from there, reengineer the other models.
This approach starts from the conceptual models and is ideal for brand new implementations.
It’s advised that the two methodologies are not mixed, though the Top-Down methodology would be the logical driver.
Data Modelling in Practice
How do we actually do Data Modelling, what ceremonies and technology should be used? Is there a defined roadmap that can be followed?
What does a Data Modelling roadmap look like? There's a ton of them available on the internet because there are a ton of companies that have done it their way. As such, a client would need to define their own roadmap to success. A high-level roadmap could look like this:
- Appoint a Data Modelling Project Owner.
- Provide PO with the required support.
- PO to define the first approach:
- Specific business /internal client?
- Specific workstream?
- Ensure that this decision leads to a methodological approach to any subsequent streams.
- Ensure to involve business in these discussions:
- This is not an IT approach, but a business process.
- Follow the Bottom-Up approach:
- Identify existing data structures.
- Define usage of this data.
- Play it back to business to ensure that it is accurate.
- Document Physical Data Models.
- Document Logical Data Models.
- Document Conceptual Models.
- Review all items with the Architecture board to ensure accuracy.
Tools in the Market
Which products could be used to facilitate the process? There are hundreds of different tools that can be used, the below tools are used at DotModus for data modelling:
- DB Designer
- MySQL Workbench
Are these the best tools?
Due to the nature of projects at DotModus these tools offer the most accessible features that can be used across multiple projects and clients without causing licensing overhead for the clients.
Are all of these tools needed?
No, it depends on the nature of the project and what technologies the client might already be using.
Which of these tools should we use?
That can be defined during the process, allowing for flexibility in approach rather than being stuck in without having a full understanding of the entire process and journey.
In short, there will be known knowns, known unknowns and unknown unknowns. At the start of this process, there will be many more of the latter two. Choosing a specific technology or application with that much uncertainty could become a detrimental factor down the line.
Though this isn’t a unique concept to any business, we hope to have provided insight to those that needed it. If you would like to see how data modelling could help your business, feel free to reach out to us by clicking here.