Your customers want a user-friendly, reliable, scalable, secure and maintainable system. How can you make them happy?
To design an efficient and effective data system, you will need to adhere to some principles. These principles define the underlying rules and play a critical role in directing the organisation’s journey to a target solution. In this article, we will look at the five general design principles for the data system.
Before we proceed, I would like to mention:
- While you read, you will realise some points apply to any kind of system, and not just data system, which is good as I want it to be generic;
- I use the word system and data system interchangeably, they mean the same in this context;
- There are different kinds of data systems and these general principles apply to them all, however, each type of data system could have specific design principles when looked at a more granular level.
Be it a product or a service, the primary focus of any design decision should be the users. After all, you wouldn’t want to build a system that will not serve the user’s purpose. It is no different when it comes to designing data systems. The executives want to look at the data in a particular form (visualisations) at a particular time (quarterly, monthly), the Data Analysts in other forms (detailed reports, record level data) and frequency (weekly, daily), the Data Scientists might need the raw (or feature engineered) data to train the ML models on an ad hoc basis. Furthermore, the applications and pipelines that consume data would need it in a different structure and often in near real-time. The Data Operators, system support and maintenance engineers are all users who interact with the data in different shapes and cuts, at different points of the data journey.
It is very tempting to start conceptualizing a solution for a given problem. When we start doing that, it creates a bias towards a solution which might not be right for some users. So, it is essential to spend enough time on understanding the needs of every set of users before thinking of a solution. Conducting workshops, group interviews or even individual interviews with separate users to understand their roles and responsibilities, their way of working, the problems they are facing with the current systems, etc. play an important part towards building an effective solution. It is equally important to assess the consumer systems and evaluate how those are designed to engage with the incoming data and produce data for other consumers.
The following links will give you a fair idea about interviewing users –
TLDR: User needs should drive design decisions, conceptualize a solution only after the needs are completely defined and reviewed, make adjustments in iterations.
The undeniable fact about data systems is that things will inevitably go wrong. Hardware will fail and it could be for numerous reasons such as age, surrounding conditions, accidents, and even disasters like fire or earthquake. Software failures are common and, in some cases, might go undetected for long. Even with the best intentions, the humankind is known to be unreliable and could cause faults in the systems like, incorrect configurations, wrong installations, etc. In data systems, such failures can cause availability problems, inaccurate analytics, data loss, and even data breach, costing organisations not only money but a loss in reputation and customers. We can only make a data system failure-proof by designing it to prevent, circumvent and document those failures.
Mitigate hardware failures by replacing ageing equipment, make data centres safe and less prone to accidents, provision redundant hardware in the region to cover for data centre failures and across geographies to protect against disasters. Design reliable software solutions — follow good coding practices, focus on not only detecting and informing about errors but also let the system handle errors as and when they occur and resolve them automatically. Reduce human intervention by automating deployments and configurations, publish well-documented guides to follow where automation is not possible.
TLDR: Design systems thinking that they will fail and make sure there is no single point of failure.
Your strategies could be successful in preventing or avoiding complete failure scenarios. But what happens over time when the data grows, the user base increases and the changes in business eventually demand the data system to evolve. Would your system be able to hold the volume of data, cope with the huge number of I/O requests or easily change as the business use-cases evolve? What if the system doesn’t crash, but responds slowly, would you be able to handle the load on your system during the Easter weekend or any other day for that matter?
It is therefore vital to think about the options for coping with growth, increase in load over time, and the evolvability of the code while designing systems. Understand the parameters that could define the load on the system, which might be different across architectures and depend heavily on how systems are used. A system could have a low number of users reading or writing a huge amount of data, or it could have a high number of users making constant read/write requests.
These factors would define whether you choose to include a cache store to speed up read/write operations or scale-out storage to distribute the load of the data and encourage parallel processing. Monolithic architectures could be difficult to modify with changing business needs and so think about designing domain-driven microservices systems which could be easily changed over time with less development effort.
TLDR: Understand how the system might grow over time and design it to handle growth. Consider using microservices architecture in designing systems so it is easier, quicker and efficient to make changes over time.
Everyone understands how crucial security is, especially with systems that store, process and produce data. Although we appreciate the value of data security, yet we encounter an ample number of security issues now and then, sometimes involving large organisations. Statistically speaking, it is proving to be difficult to solve the security puzzle no matter how much we spend behind securing systems or how many processes and frameworks we put in place.
When it comes to data systems, you will need to put enough thought ensuring the data is secure when at rest and in transit, protecting data from unauthorised access, allowing external access only through secured channels and validating each request through the firewall. Build processes to monitor activities, raise alerts on suspicious actions, embed intelligence to detect or even predict vulnerabilities, and carry out regular audits to guarantee compliance. Security is more than tools and processes, every individual in the organisation is responsible for it and your design should consider every aspect of it.
TLDR: Security is a mindset, think about every aspect of vulnerability when you design the data system. Data is always encrypted, access control is based on roles, every service asks for a key to allow a request, external access is through secured channels, requests are monitored, and alerts are raised on suspicion.
Designing and developing systems in the lab is one thing and getting that system to run in production is a different ball game altogether. Everything you designed for, tested and checked will eventually baffle you haven’t put much thought into operations. Cloud solutions have radically changed the role of operations in recent years reducing the burden of managing the hardware and infrastructure hosting the applications. However, operations still need to deal with large scale, critical deployments, incident management, monitoring, user management and other administrative tasks.
The fundamental point to keep in mind while designing data systems bound to be operationalised is to make things observable. Logging the data journey and tracing all the events that have taken place through the journey makes it a lot easier for operations to demystify issues and resolve them. Implementing a robust monitoring and alerting solution provides insight into the system’s performance and help operations react to issues on time. Automatic alerting not only helps humans to act on incidents, but it also facilitates immediate execution of automated solutions making failures invisible to users. Design automation of manual tasks where possible, like deployments, configurations etc. Publish documents and guides for operations users for learning the internals of the system, conducting root cause analysis, or for answering business queries.
TLDR: Log and trace events that affect the data through its journey, place automatic monitoring and alerting systems in place, automate manual tasks and publish documents to facilitate support and maintenance activities.
In recent days organisations have started to show more interest in setting up systems for making data-driven decisions. Many of them are transitioning towards highly resilient, flexible, scalable and secure solutions. Read my article here on why should your organisation modernise its data platform.
There is a large demand for data-driven digital transformation projects and it will rise in the coming years. Following these five design principles will help you as a designer to focus on the essentials while building data systems and delivering these projects successfully.
If you read books, I recommend reading Martin Kleppmann’s Designing Data-Intensive Applications, it is for Architects, System Designers, Developers, Technical Managers, and pretty much anyone who develop or work with applications that have some kind of service to store, manage and process data.