When data professionals ask me how to keep their skill sets competitive, I direct them towards the horizon. How is the industry moving? Which technologies do they see less frequently on projects? Which technology stalwarts remain? A foundational piece of big data is the database, which stores, organizes, queries, and analyzes an organization’s data. Database management systems have long been ruled by relational models that run structured data (e.g. numbers and columns) and non-relational models used for unstructured data (e.g. videos, social media posts, etc.).
In Choosing Right Database for Your Enterprise, a new ebook I created with Udemy for Business, we look at why the database piece of your data puzzle requires robust data architecture knowledge, how to make the right database choice for your application, and the skills managers should prioritize for their team to gain expertise in building and scaling a data infrastructure. Keep your team up to speed on all the latest database skills with an online subscription to Udemy for Business courses.
As part of the ebook, I share 4 database trends I’m keeping a close eye on. While the relational versus non-relational choice will likely be around for some time to come, it’s always a good idea to have an eye toward the future when making technology decisions in the present. Download Ebook: Choosing the Right Database for Your Enterprise.
1. The CAP theorem is getting fuzzy
Can you have your cake and eat it, too? Recent advances mean you don’t necessarily need to make the usual trade-offs in the CAP theorem triangle. The CAP theorem states that a database system can only reliably support two of three properties — Consistency, Availability, and Partition Tolerance. A team must determine which property to compromise for the other two. For more information on the CAP theorem and how to use it when choosing a database for your organization: Download ebook here.
However, some big data experts have proposed replacing the CAP theorem with the PIE (platform flexibility, infinite scale, and efficiency) theorem as a better reflection of the trade-offs modern system architects must make. Amazon Redshift, for example, is a relational data warehouse that is fully distributed, horizontally scalable, and highly reliable. MySQL and PostgreSQL offer sharding mechanisms to access the benefits of non-relational databases. And already, most database systems can provide high availability, even if it’s due to a nominal trade-off with consistency.
2. The growth of data lakes: Making data structured
Increasingly, pools of unstructured data (such as CSV or TSV files) are being stored in large cloud repositories. These are called “data lakes.” Systems, such as AWS Glue, can impart structure and offer data queries by relational databases without making a copy of the data in the process. This approach offers the benefits of massively scalable, unstructured data, together with the ability to query that data as you would from a relational database.
Certify your team as big data experts with the AWS Certified Big Data Specialty 2019 course and practice exam.
3. Elasticsearch: Search-engine based databases
This type of non-relational database uses indexes to categorize data by its similar characteristics. A popular example of this is Elasticsearch, an efficient, scalable data store in addition to a capable search engine. Many organizations use Elasticsearch to store numerical data while using its “Elastic Stack” tool to visualize and analyze the data. The tool also includes machine learning capabilities to automatically identify anomalous data and tools for transferring data into Elasticsearch at massive scale.
Stay ahead of the competition. Learn more about Elasticsearch: Elasticsearch 7 and the Elastic Stack – In Depth & Hands On!
4. Time-series databases
Analyzing data for trends over time dictates the need to index your data by time, in time order. Like graph databases, time-series databases serve a specialized need — but it’s a common one. These systems are still emerging, but it speaks to a larger trend of using many different, specialized databases for the many different, specialized challenges your organization faces. Some time-series database vendors include InfluxData, kdb+, and Prometheus.
These emerging technologies are worth investigating to understand how they may best help your organization manage complex data and grow its data center capacity exponentially. Get a better understanding of foundational data architecture by downloading my latest ebook: How to Choose the Right Database for Your Enterprise.
About the author:
Frank Kane is a Udemy Instructor and founder of Sundog Software. Frank spent 9 years at Amazon and IMDb, developing and managing the technology that automatically delivers product and movie recommendations to hundreds of millions of customers. Frank holds 17 issued patents in the fields of distributed computing, data mining, and machine learning.
About Udemy for Business:
Udemy for Business is a learning platform that helps companies stay competitive in today’s rapidly changing workplace by offering fresh, relevant on-demand learning content, curated from the Udemy marketplace. Our mission is to help employees do whatever comes next—whether that’s the next project to do, skill to learn, or role to master. We’d love to partner with you on your employee development needs. Get in touch with us at firstname.lastname@example.org