Software components of data mart




















The subset of data held in a data mart typically aligns with a particular business unit like sales, finance, or marketing. Data marts accelerate business processes by allowing access to relevant information in a data warehouse or operational data store within days, as opposed to months or longer. Because a data mart only contains the data applicable to a certain business area, it is a cost-effective way to gain actionable insights quickly.

Data marts and data warehouses are both highly structured repositories where data is stored and managed until it is needed. However, they differ in the scope of data stored: data warehouses are built to serve as the central store of data for the entire business, whereas a data mart fulfills the request of a specific division or business function.

Because a data warehouse contains data for the entire company, it is best practice to have strictly control who can access it. Additionally, querying the data you need in a data warehouse is an incredibly difficult task for the business.

Thus, the primary purpose of a data mart is to isolate—or partition—a smaller set of data from a whole to provide easier data access for the end consumers. A data mart can be created from an existing data warehouse—the top-down approach—or from other sources, such as internal operational systems or external data.

Similar to a data warehouse, it is a relational database that stores transactional data time value, numerical order, reference to one or more object in columns and rows making it easy to organize and access. On the other hand, separate business units may create their own data marts based on their own data requirements.

If business needs dictate, multiple data marts can be merged together to create a single, data warehouse. This is the bottom-up development approach. There are three types of data marts: dependent, independent, and hybrid.

They are categorized based on their relation to the data warehouse and the data sources that are used to create the system. A dependent data mart is created from an existing enterprise data warehouse. It is the top-down approach that begins with storing all business data in one central location, then extracts a clearly defined portion of the data when needed for analysis.

To form a data warehouse, a specific set of data is aggregated formed into a cluster from the warehouse, restructured, then loaded to the data mart where it can be queried. It can be a logical view or physical subset of the data warehouse:. To suit the requirements of our organizations, we arrange these building we may want to boost up another part with extra tools and services.

All of these depends on our circumstances. The figure shows the essential elements of a typical warehouse. We see the Source Data component shows on the left. The Data staging element serves as the next building block. In the middle, we see the Data Storage component that handles the data warehouses data. This element not only stores and manages the data; it also keeps track of data using the metadata repository. The Information Delivery component shows on the right consists of all the different ways of making the information from the data warehouses available to the users.

Production Data: This type of data comes from the different operating systems of the enterprise. Based on the data requirements in the data warehouse, we choose segments of the data from the various operational modes. Internal Data: In each organization, the client keeps their " private " spreadsheets, reports, customer profiles, and sometimes even department databases.

This is the internal data, part of which could be useful in a data warehouse. Archived Data: Operational systems are mainly intended to run the current business. In every operational system, we periodically take the old data and store it in achieved files. External Data: Most executives depend on information from external sources for a large percentage of the information they use.

They use statistics associating to their industry produced by the external department. After we have been extracted data from various operational systems and external sources, we have to prepare the files for storing in the data warehouse. The extracted data coming from several different sources need to be changed, converted, and made ready in a format that is relevant to be saved for querying and analysis.

We have to employ the appropriate techniques for each data source. If data extraction for a data warehouse posture big challenges, data transformation present even significant challenges. We perform several individual tasks as part of data transformation. First, we clean the data extracted from each source.

Cleaning may be the correction of misspellings or may deal with providing default values for missing data elements, or elimination of duplicates when we bring in the same data from various source systems. Standardization of data components forms a large part of data transformation. Data transformation contains many forms of combining pieces of data from different sources. We combine data from single source record or related data parts from many source records. On the other hand, data transformation also contains purging source data that is not useful and separating outsource records into new combinations.

Sorting and merging of data take place on a large scale in the data staging area. When the data transformation function ends, we have a collection of integrated data that is cleaned, standardized, and summarized. When we complete the structure and construction of the data warehouse and go live for the first time, we do the initial loading of the information into the data warehouse storage. The initial load moves high volumes of data using up a substantial amount of time.

Data storage for the data warehousing is a split repository. The data repositories for the operational systems generally include only the current data. Also, these data repositories include the data structured in highly normalized for fast and efficient processing. The information delivery element is used to enable the process of subscribing for data warehouse files and having it transferred to one or more destinations according to some customer-specified scheduling algorithm.

Metadata in a data warehouse is equal to the data dictionary or the data catalog in a database management system. An advantage to this model is that individual business units can run the data mart that suits them best.

Of course, with this independence comes the need for technical administrative expertise at each data mart. Plus, if data will need to be aggregated across data marts — for executive-level reporting, for instance — you will need to construct queries that access multiple data marts.

Some organizations find it practical to consider a hybrid model where some data marts are dependent on a central warehouse and some exist on their own. For example, it might be more efficient to use this model as a transitionary step for new data marts. New subject-specific data sources might be easier to deploy as independent data marts. Once they have proven their value, they can be deployed through to the central data warehouse, if needed. Or the hybrid model might be a good path to integrate acquisitions.

A significant consideration is the human resources required to run the hybrid model. It requires technical administrative expertise at both the central data warehouse and at the data mart level. Which type of data mart a business chooses depends on a lot of factors, including how the company is organized. These four questions can help an organization determine which type suits it best:.

While a data warehouse is a repository for all the data that helps a business run, a data mart is a condensed subset of business data designed for a specific purpose, business unit or department. Data marts draw on fewer, more specialized data sources. A data mart strategy might not need to include a data warehouse. Instead, the data warehouse might be the aggregate of all your data marts.

Typically, the raw data in data lakes has a lot less structure and has yet to be cleaned and normalized. Data marts, on the other hand, are the result of highly structured, cleaned and normalized data. More importantly, though, is that data marts are designed to provide specific solutions to individual groups while data lakes are meant for more open-ended analyses — even unanticipated ones.

It stores data the organization owns and often third-party licensed data in a way that can be retrieved via queries. Structured query language SQL is the most prevalent way that data is output from a database. A database might feed multiple data marts and, depending on the size of your data set and your data strategy, a data mart might draw from more than one database.

There are three schema-level and interrelated data architectures for data marts: star, snowflake and denormalized tables. The star structure is the most straightforward of the three, and thus reduces the complexity of deploying data marts. In the star structure, business-level data is broken out into tables of facts for example, sales data. These tables interact with relevant dimensions.

For example, a sales facts table may relate directly to a dimension table that lists products. To visualize the schema and understand where the star label comes from, look at a representation of a sales facts table with at least four attributes: date, location, product and quantity.

Similarly, a dimension of available products provides a centralized and official list of all products. These are connected to the sales facts table via a product identifier. As these dimensions blossom out, you start to see a star pattern where a central table interacts with a single one-dimensional layer of related tables.

Alternatively, and perhaps more realistically, consider the case where a star-structured data mart contains dimension tables that are themselves subject to further dimensions. Depending on how much data is involved, these joins will reduce the responsiveness of the reports.

It could also be a significant resource hog. An alternative is to use denormalized tables, eliminating the joins and thus performing queries more efficiently. A denormalized tables structure brings together all the data needed for a data mart report into one table which will produce faster queries and will likely generate redundant data.

While this redundant data makes inserts and updates more expensive, the bet with denormalized tables is that the efficiencies of queries outweigh those costs. The biggest advantage of data marts is efficiency, both in terms of costs and data access. Data marts cost much less to deploy than a data warehouse and access to data is much faster because data marts refer to smaller datasets.

Queries into a central data warehouse can be long and arcane as they negotiate with irrelevant data. A well-constructed data mart strategy can provide business unit and departmental leaders with very fast access to the data they need. Some queries that were previously conducted live can be presented as scheduled queries in a data mart.

This gives team members access to the information they need while using only a small fraction of the computing resources previously used. But even live-updated data is delivered more efficiently through a data mart simply because it is drawn from a more focused set of data.

Another advantage of data marts is they can be independent of each other, so an outage at the central data warehouse does not have to effect individual data marts. And when a data mart includes licensed third-party data, a key advantage is that the license cost should be lower because the user base for the data is smaller than if it were in a data warehouse. Since a data mart contains only the data needed by a single business group, it does not on its own provide visibility into the broader set of data a business might need.

Similarly, in an independent data mart model that excludes a central data warehouse, the business may not have ready access to cross-data-mart reporting for certain kinds of high-level reports. Comprehensive requirement-gathering helps to plan out the design that will result in the logical, physical and technical characteristics of a strong data mart deployment. Use this information to plan out what kinds of data will appear in which data marts. During this phase, most organizations decide on their data mart architecture and make other decisions that will have a long-lasting impact on how the data marts are used.

If the organization has a data warehouse, review the existing data warehouse schema as well as licensed third-party data to determine criteria that will enable data to flow properly in the new data mart plan. At the same time, consider how the existing data warehouse technology and architecture can support your data mart needs.

Some modifications will likely need to be made, both in the system itself as well as the licensing. When designing data marts for far-flung regional branches, consider potential service interruptions.

Not all parts of the world have excellent connectivity; the need to keep people working might dictate key elements of data mart design. This phase is where you make decisions and purchases to create and deploy the physical and logical structures of the data mart architecture.

Determine what database to use. If the data mart strategy is based on existing data warehouse technology, plan out and deploy needed modifications. Though it may seem early, this is also a good time to consider future administrative activities. Make a specific plan for logging and analyzing user activity and access statistics including load and response times.

This will be enormously helpful when fielding support questions. Backups and redundancy are also critical to build into your system. Questions to consider include: Where is the data that you need to include owned as well as licensed? What are the terms both legal and technical in which that data is available?

What fields will be used to join and connect disparately sourced datasets? How will data be cleaned and normalized? Also, identify what components of your data mart will require live querying versus which can be scheduled out.

This is the phase when the planned subsets of your overall data warehouse can first be accessed. Set up specific queries and reports so that they can be accessed through the data mart interface.



0コメント

  • 1000 / 1000