{"id":9584,"date":"2016-05-17T15:43:01","date_gmt":"2016-05-17T19:43:01","guid":{"rendered":"http:\/\/www.iri.com\/blog\/?p=9584"},"modified":"2023-02-10T15:57:36","modified_gmt":"2023-02-10T20:57:36","slug":"the-use-of-data-lakes","status":"publish","type":"post","link":"https:\/\/beta.iri.com\/blog\/business-intelligence\/the-use-of-data-lakes\/","title":{"rendered":"The Use of Data Lakes"},"content":{"rendered":"<p>Has your organization considered using a data lake? This article explains what a data lake is, and how you can fish its murky depths for value in an architecture optimized for your needs. IRI Voracity users should also be able to assess this approach relative to <a href=\"https:\/\/www.iri.com\/solutions\/data-integration#paradigms\">other paradigms<\/a> the platform supports, including LDW, EDW, ODS, and eventually, self-service BI, too.<\/p>\n<h2>What Is a Data Lake?<\/h2>\n<p>Data lakes are environments for gathering and storing data for experimentation. The term data lake was coined by Pentaho CTO James Dixon. He used the term to compare data that was cleansed, packaged, and structured &#8212; like what was found in a data mart (or bottled water) &#8212; to data in its more natural state (in a large, real body of water).<\/p>\n<p>&#8220;The contents of the data lake stream in from one or more sources to fill the lake, and various users of the lake can come to examine, dive in, or take samples,\u201d Dixon blogged.<\/p>\n<p>Gartner refers to the data lake as &#8220;a collection of storage instances of various data assets additional to the originating data sources. These assets are stored in a near-exact, or even exact, copy of the source format.\u201d<\/p>\n<p>Thus, the data lake is a single store of enterprise data that includes both raw data (which implies an exact copy of source data) and transformed data used for reporting and analytics. Some want the data lake to replace the traditional data warehouse, while others see it as more of a staging area to feed data into existing data warehouse architectures.<\/p>\n<p>Data lakes can exist in the file system, a Hadoop fabric, or cloud storage service like Amazon S3.<\/p>\n<h2>Why Use a Data Lake?<\/h2>\n<p>Traditional architectures silo information into buckets that provide only a fraction of the insight that might be derived from a larger collection of data. Data marts and warehouses require data to be identified, cleansed, and formatted so they fit pre-defined notions of what they represent. Dixon argues that there could be many other valuable notions derived from data found in its unprocessed, natural state.<\/p>\n<p>According to Gartner, the purpose of a data lake is to present an unrefined view of data so skilled analysts can apply their data mining techniques devoid of the \u201csystem-of-record compromises that may exist in a traditional analytic data store (such as a data mart or data warehouse).\u201d The data lake removes the constraints of relational structures when various forms of \u2018big data\u2019 need to be examined in new ways.<\/p>\n<p>Consolidating traditionally isolated data sources can also increase the sharing and use of information, and reduce the costs of hardware and software holding that data now.<\/p>\n<p>Data lakes thus provide the \u201copportunity\u201d to not have to prepare and protect all the data an organization gathers, and instead, just save it for later when new needs or ideas arise. Companies that build successful data lakes find they need to gradually mature their lake as they figure out which data and metadata are interesting to their organization.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter wp-image-12023\" src=\"http:\/\/www.iri.com\/blog\/wp-content\/uploads\/2016\/05\/data-lakes-infographic-968x1024.png\" alt=\"data lakes infographic\" width=\"600\" height=\"635\" srcset=\"https:\/\/beta.iri.com\/blog\/wp-content\/uploads\/2016\/05\/data-lakes-infographic-968x1024.png 968w, https:\/\/beta.iri.com\/blog\/wp-content\/uploads\/2016\/05\/data-lakes-infographic-284x300.png 284w, https:\/\/beta.iri.com\/blog\/wp-content\/uploads\/2016\/05\/data-lakes-infographic-768x812.png 768w\" sizes=\"(max-width: 600px) 100vw, 600px\" \/><\/p>\n<p>&nbsp;<\/p>\n<h2>Red Flags &#8211; Governance and\u00a0Efficiency<\/h2>\n<p>There are naysayers who relegate the data lake to a mere notion, particularly since many organizations are unsuccessful with their deployments. Cambridge Semantics CTO Sean Martin said, &#8220;We see customers creating big data graveyards, dumping everything into HDFS and hoping to do something with it down the road. But then they just lose track of what\u2019s there.&#8221;<\/p>\n<p>As with any major data-driven initiative, the lake will have to be sold across the enterprise. Data lakes absorb data from a variety of sources and store it all in one place, and by definition, without the usual requirements for integration (like quality and lineage) or security. Someone will need to be accountable for governance. Data Vault inventor Dan Linstedt warns, for example:<\/p>\n<blockquote>\n<p style=\"text-align: left;\" align=\"center\"><span style=\"color: #339966;\"><em>Users of self-service BI tools trolling the lake have to be governed. Think about who gets to use which tool, who gets to log in where and access what data, or who can open a spreadsheet and upload data directly to Hadoop, and then make it available to the rest of the enterprise. That can be a serious problem.<\/em><\/span><\/p>\n<\/blockquote>\n<p>David Weldon&#8217;s April 2017 article in Information Management magazine, &#8220;<em>Many Organizations Struggling to Manage Lakes<\/em>,&#8221; affirmed the issue in this quote from Zaloni&#8217;s CEO Ben Sharma:<\/p>\n<blockquote>\n<p style=\"text-align: left;\" align=\"center\"><span style=\"color: #339966;\"><em>Perhaps the biggest challenge organizations are facing is \u201cfinding, rationalizing and curating the data from across an enterprise for analytics solutions &#8230; the ability to easily access data, refine data and collaborate on data needs continues to be a large roadblock for many analytic applications.\u201d<\/em><\/span><\/p>\n<\/blockquote>\n<p>This is why governing data in a lake is important, which means, dealing with veracity, security and metadata lineage issues, to name a few. See De-Mucking the Lake, below.<\/p>\n<p>The other issue to consider is performance. <span style=\"font-style: italic;\">Most <\/span>tools and data interfaces cannot ingest, process, or produce information in an unmanaged lake as well as data in fit-for-purpose (e.g., query optimized) environments. Thus, consistent semantics and an engine like CoSort will help.<\/p>\n<h2>Stocking the Lake<\/h2>\n<p>Data enter the lake from various sources, including structured data from files and databases (rows and columns), semi-structured data (ASN.1, XML, JSON), unstructured data (emails, documents, and pdfs), or possibly images, audio, and video &#8230; thereby creating a centralized store for all forms of data.<\/p>\n<p><a style=\"color: #1155cc; text-decoration: underline;\" href=\"http:\/\/www.iri.com\/products\/voracity\">IRI Voracity<\/a> is a data connection and curation platform that can be used to populate a data lake by connecting to, profiling, and moving data in different <a style=\"color: #1155cc; text-decoration: underline;\" href=\"http:\/\/www.iri.com\/products\/workbench\/data-sources\">sources<\/a> into the lake:<\/p>\n<p><a href=\"http:\/\/www.iri.com\/blog\/wp-content\/uploads\/2016\/05\/vorcity-flyer-front-no-banner.png\"><img loading=\"lazy\" decoding=\"async\" class=\"wp-image-16385 size-large aligncenter\" src=\"http:\/\/www.iri.com\/blog\/wp-content\/uploads\/2021\/08\/voracity-flyer-front-no-banner-1024x695.png\" alt=\"IRI Voracity data management platform\" width=\"1024\" height=\"695\" srcset=\"https:\/\/beta.iri.com\/blog\/wp-content\/uploads\/2021\/08\/voracity-flyer-front-no-banner-1024x695.png 1024w, https:\/\/beta.iri.com\/blog\/wp-content\/uploads\/2021\/08\/voracity-flyer-front-no-banner-300x204.png 300w, https:\/\/beta.iri.com\/blog\/wp-content\/uploads\/2021\/08\/voracity-flyer-front-no-banner-768x521.png 768w, https:\/\/beta.iri.com\/blog\/wp-content\/uploads\/2021\/08\/voracity-flyer-front-no-banner-1536x1042.png 1536w, https:\/\/beta.iri.com\/blog\/wp-content\/uploads\/2021\/08\/voracity-flyer-front-no-banner-2048x1390.png 2048w, https:\/\/beta.iri.com\/blog\/wp-content\/uploads\/2021\/08\/voracity-flyer-front-no-banner.png 1110w\" sizes=\"(max-width: 1024px) 100vw, 1024px\" \/><\/a><br \/>\nDuring or after movement, you can\u00a0<span style=\"background-color: #ffffff;\">select, transform, reformat, and report on data from disparate sources using jobs defined by scripts, wizards, dialogs, or diagrams in Eclipse. You can use Voracity to govern and test-analyze data in the lake, and to move data out of the lake.<br \/>\n<\/span><span style=\"font-size: 12pt; font-weight: bold;\"><span style=\"font-size: 12pt; font-weight: bold;\"><br \/>\n<\/span><\/span><\/p>\n<h2>Fishing the Lake<\/h2>\n<p>Once you\u2019ve defined the lake\u2019s location and what to pump into it, spend some time now and again to see what\u2019s currently in it. Consider what experiments can be run on that data. Use your own data discovery tools &#8212; or Voracity\u2019s flat-file, ODBC, and dark (document) data search, statistical, and relationship checking and diagramming tools &#8212; before \u201ctesting the water.\u201d<\/p>\n<p>Think of this aspect of Voracity as sonar, where you\u2019re trying to find different kinds of data and at different depths (various <a style=\"color: #1155cc; text-decoration: underline;\" href=\"http:\/\/www.iri.com\/products\/workbench\/data-sources\">sources<\/a> in the lake). Voracity discovery <a style=\"color: #1155cc; text-decoration: underline;\" href=\"http:\/\/www.iri.com\/products\/workbench\/discover-data\">tools<\/a> classify data, and allow you to fuzzy-search for values from, ODBC and file sources. They also search those and unstructured sources for: explicit strings, values conforming to canned or custom RegEx patterns, and values in a set (lookup) file. Those tools are actually free, since they only require Voracity\u2019s GUI (IRI Workbench), not an underlying CoSort or Hadoop transformation license.<\/p>\n<p>After identifying data in the lake that looks worthwhile, even data scientists can struggle deriving value from it without the benefit of semantic consistency or managed metadata. It is much harder to manipulate or analyze data without them. Voracity wizards auto-create metadata for the collections within, and build ETL, federation, masking, reformatting, and\/or reporting jobs that filter relevant data from the lake, transform it into useful information, and <a style=\"color: #1155cc; text-decoration: underline;\" href=\"http:\/\/www.iri.com\/products\/workbench\/voracity-gui\/display\">display<\/a> it.<\/p>\n<p>Voracity manipulates data with the CoSort engine (by default) or, optionally, in Hadoop with the same metadata. Plugins to Voracity\u2019s Eclipse \u201cWorkbench\u201d GUI also run: Python, R, SQL, Java, shell scripts, SQL procedures, and C\/C++ or Java programs. These tools enable you to do more with lake-related data and apps in the same GUI.<\/p>\n<h2>De-Mucking the Lake<\/h2>\n<p>Recall that a key problem with data lakes, as with real lakes, is that people don\u2019t know what\u2019s in them, or how clean they are. In nature, unknown things in the water can kill the ecosystem. Unknown data dumped into a data lake can kill the project. Dan Linstedt again advises that:<\/p>\n<blockquote>\n<p style=\"text-align: left;\" align=\"center\"><span style=\"color: #339966;\"><em>If there\u2019s no structure, there\u2019s no understanding, and there\u2019s no vision for how to apply this data or even understand what you have. It needs to be classified and cleansed in order to do anything with it and turn it into value-added information for the business. To apply any sort of business information to this data, you must begin to stratify, profile, manage, and understand it, so that you can get results from it.<\/em><\/span><\/p>\n<\/blockquote>\n<p>In short you have to have enough trust in the data to trust your analysis. So it\u2019s better to know and manage what\u2019s in the water. If you use Voracity, you can discover, integrate, migrate, govern and analyze data in the lake &#8212; or prepare test or production-ready targets for other architectures, like a data warehouse, mart, or ODS &#8212; all within a managed metadata infrastructure.<\/p>\n<p>You also want to be able to\u00a0dredge the data lake clean, at least as much as you can, through various data cleansing operations. You can use Voracity to improve data quality in the lake in these ways:<\/p>\n<ul>\n<li style=\"font-weight: 400;\"><em>Find<b> <\/b><\/em><span style=\"font-weight: 400;\">&#8211; discover, profile, and classify data from a quality standpoint<\/span><\/li>\n<li style=\"font-weight: 400;\"><em>Filter<b> <\/b><\/em><span style=\"font-weight: 400;\">&#8211; remove or save conditionally selected or duplicate items<\/span><\/li>\n<li style=\"font-weight: 400;\"><em>Unify<b> <\/b><\/em><span style=\"font-weight: 400;\">&#8211; data found by fuzzy match algorithms and set probabilities<\/span><\/li>\n<li style=\"font-weight: 400;\"><em>Replace<b> <\/b><\/em><span style=\"font-weight: 400;\">&#8211; data found in pattern searches with literal or lookup values<\/span><\/li>\n<li style=\"font-weight: 400;\"><em>Validate<b> <\/b><\/em><span style=\"font-weight: 400;\">&#8211; identify null values and other data formats by function<\/span><\/li>\n<li style=\"font-weight: 400;\"><em>Regulate<b> <\/b><\/em><span style=\"font-weight: 400;\">&#8211; apply rules to find and fix data out of range or context<\/span><\/li>\n<li style=\"font-weight: 400;\"><em>Synthesize<b> <\/b><\/em><span style=\"font-weight: 400;\">&#8211; custom composite data types, and new row or file formats<\/span><\/li>\n<li style=\"font-weight: 400;\"><em>Standardize<b> <\/b><\/em><span style=\"font-weight: 400;\">&#8211; use field-function APIs for Melissa Data or Trillium<\/span><\/li>\n<\/ul>\n<p>With less garbage in the lake, less garbage will come out in your analytic results, and the water will be cleaner for everyone else, too.<\/p>\n<p>Also from a governance perspective is the issue of finding, classifying, and de-identifying personally identifiable information (PII) in the data sets. Voracity addresses these problems as well, and offers a wide range of rule (and role)- based encryption, redaction, pseudonymization \u00a0and related data protection functions that can be applied ad hoc or globally to like columns.<\/p>\n<h2>Other Conservation Programs<\/h2>\n<p style=\"background-color: #ffffff;\"><span style=\"background-color: #ffffff;\"><span style=\"background-color: #ffffff;\">For advanced information architects, Linstedt advocates combining Data Vault with Voracity:<br \/>\n<\/span><\/span><\/p>\n<blockquote>\n<p style=\"text-align: left;\" align=\"center\"><span style=\"color: #339966;\"><em>As far as IRI is concerned, I like their solution because we can govern the end-to-end processing in a central place. With that governance comes the ability to manage. Wrap the Data Vault architecture into that mix, and all of a sudden you have standards around your IT, data and information processes, and around the data modeling constructs that are behind the scenes of a future warehouse iteration of this data. <\/em><\/span><\/p>\n<\/blockquote>\n<p style=\"background-color: #ffffff;\"><span style=\"background-color: #ffffff;\"><span style=\"background-color: #ffffff;\">Additional administrative management of the environment is helpful, too: <\/span><\/span><\/p>\n<blockquote>\n<p style=\"text-align: left;\" align=\"center\"><span style=\"color: #339966;\"><em>You need centralized or shareable metadata that persists and can be readily modified. And if you can automate processes that prepare and report on data, then you can leverage process repeatedly for what-if analysis and thus get to improved results sooner.<\/em><\/span><\/p>\n<\/blockquote>\n<p>Voracity\u2019s approach to <a style=\"color: #1155cc; text-decoration: underline;\" href=\"http:\/\/www.iri.com\/solutions\/metadata-mdm\/metadata-management\">metadata management<\/a> is simplified by virtue of its automatic creation, self-documenting syntax, hub support in Eclipse systems like <a href=\"http:\/\/www.iri.com\/blog\/iri\/iri-workbench\/introduction-metadata-management-hub\/\">Git<\/a> for lineage, security, and version control. For more advanced metadata management and automation, Voracity users can leverage a\u00a0<a href=\"https:\/\/www.iri.com\/ftp9\/pdf\/Voracity\/IRI-Voracity-AnalytiXDS-Platform-Combo-Data-Sheet.pdf\">seamless bridge<\/a> to a graphical lineage and impact analysis environment in Erwin Mapping Manager.<\/p>\n<p>Voracity\u2019s built-in task scheduler allows you to sequence &#8212; and fine tune the repetition of &#8212; integration, cleansing, masking, reporting, and\/or other jobs you might want to run on lake data.<\/p>\n<p>The bottom line is that a data lake can be a helpful place to test new theories about data now in silos. So stock it, mind your visitors, and see what good can surface from the muck.<\/p>\n<p align=\"center\"><a href=\"\/blog\/wp-content\/uploads\/2016\/05\/lake-image.jpg\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone wp-image-13150\" src=\"\/blog\/wp-content\/uploads\/2016\/05\/lake-image.jpg\" alt=\"\" width=\"651\" height=\"265\" srcset=\"https:\/\/beta.iri.com\/blog\/wp-content\/uploads\/2016\/05\/lake-image.jpg 960w, https:\/\/beta.iri.com\/blog\/wp-content\/uploads\/2016\/05\/lake-image-300x122.jpg 300w, https:\/\/beta.iri.com\/blog\/wp-content\/uploads\/2016\/05\/lake-image-768x313.jpg 768w\" sizes=\"(max-width: 651px) 100vw, 651px\" \/><\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Has your organization considered using a data lake? This article explains what a data lake is, and how you can fish its murky depths for value in an architecture optimized for your needs. IRI Voracity users should also be able to assess this approach relative to other paradigms the platform supports, including LDW, EDW, ODS,<\/p>\n<div><a class=\"btn-filled btn\" href=\"https:\/\/beta.iri.com\/blog\/business-intelligence\/the-use-of-data-lakes\/\" title=\"The Use of Data Lakes\">Read More<\/a><\/div>\n","protected":false},"author":5,"featured_media":13150,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"_exactmetrics_skip_tracking":false,"_exactmetrics_sitenote_active":false,"_exactmetrics_sitenote_note":"","_exactmetrics_sitenote_category":0,"footnotes":""},"categories":[108,32,363,776,34],"tags":[52,1037,1039,1032,1036,1040,101,1018,861,81,789,1035,1033,281,75,1034,497,1038],"class_list":["post-9584","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-big-data-2","category-business-intelligence","category-data-quality","category-etl","category-business","tag-business-intelligence-2","tag-cambridge-semantics","tag-dan-lindstedt","tag-data-lake","tag-data-mart","tag-data-vault","tag-data-warehouse","tag-edw","tag-gartner","tag-hadoop","tag-iri-voracity","tag-james-dixon","tag-ldw","tag-metadata-management-2","tag-odbc","tag-ods","tag-pentaho","tag-sean-martin"],"acf":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v23.3 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>The Use of Data Lakes - IRI<\/title>\n<meta name=\"description\" content=\"Has your organization considered using a data lake? This article explains what a data lake is, and how you can fish its murky depths\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/beta.iri.com\/blog\/business-intelligence\/the-use-of-data-lakes\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"The Use of Data Lakes - IRI\" \/>\n<meta property=\"og:description\" content=\"Has your organization considered using a data lake? This article explains what a data lake is, and how you can fish its murky depths\" \/>\n<meta property=\"og:url\" content=\"https:\/\/beta.iri.com\/blog\/business-intelligence\/the-use-of-data-lakes\/\" \/>\n<meta property=\"og:site_name\" content=\"IRI\" \/>\n<meta property=\"article:published_time\" content=\"2016-05-17T19:43:01+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2023-02-10T20:57:36+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/beta.iri.com\/blog\/wp-content\/uploads\/2016\/05\/lake-image.jpg\" \/>\n\t<meta property=\"og:image:width\" content=\"960\" \/>\n\t<meta property=\"og:image:height\" content=\"391\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/jpeg\" \/>\n<meta name=\"author\" content=\"Jason Koivu\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Jason Koivu\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"9 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\/\/beta.iri.com\/blog\/business-intelligence\/the-use-of-data-lakes\/#article\",\"isPartOf\":{\"@id\":\"https:\/\/beta.iri.com\/blog\/business-intelligence\/the-use-of-data-lakes\/\"},\"author\":{\"name\":\"Jason Koivu\",\"@id\":\"https:\/\/beta.iri.com\/blog\/#\/schema\/person\/c60bc4ff5919427034376979fb2cc8df\"},\"headline\":\"The Use of Data Lakes\",\"datePublished\":\"2016-05-17T19:43:01+00:00\",\"dateModified\":\"2023-02-10T20:57:36+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\/\/beta.iri.com\/blog\/business-intelligence\/the-use-of-data-lakes\/\"},\"wordCount\":1920,\"commentCount\":0,\"publisher\":{\"@id\":\"https:\/\/beta.iri.com\/blog\/#organization\"},\"image\":{\"@id\":\"https:\/\/beta.iri.com\/blog\/business-intelligence\/the-use-of-data-lakes\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/beta.iri.com\/blog\/wp-content\/uploads\/2016\/05\/lake-image.jpg\",\"keywords\":[\"business intelligence\",\"Cambridge Semantics\",\"Dan Lindstedt\",\"data lake\",\"data mart\",\"Data Vault\",\"data warehouse\",\"EDW\",\"Gartner\",\"hadoop\",\"IRI Voracity\",\"James Dixon\",\"LDW\",\"metadata management\",\"ODBC\",\"ODS\",\"pentaho\",\"Sean Martin\"],\"articleSection\":[\"Big Data\",\"Business Intelligence (BI&#041;\",\"Data Quality (DQ&#041;\",\"ETL\",\"IRI Business\"],\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\/\/beta.iri.com\/blog\/business-intelligence\/the-use-of-data-lakes\/#respond\"]}]},{\"@type\":\"WebPage\",\"@id\":\"https:\/\/beta.iri.com\/blog\/business-intelligence\/the-use-of-data-lakes\/\",\"url\":\"https:\/\/beta.iri.com\/blog\/business-intelligence\/the-use-of-data-lakes\/\",\"name\":\"The Use of Data Lakes - IRI\",\"isPartOf\":{\"@id\":\"https:\/\/beta.iri.com\/blog\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\/\/beta.iri.com\/blog\/business-intelligence\/the-use-of-data-lakes\/#primaryimage\"},\"image\":{\"@id\":\"https:\/\/beta.iri.com\/blog\/business-intelligence\/the-use-of-data-lakes\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/beta.iri.com\/blog\/wp-content\/uploads\/2016\/05\/lake-image.jpg\",\"datePublished\":\"2016-05-17T19:43:01+00:00\",\"dateModified\":\"2023-02-10T20:57:36+00:00\",\"description\":\"Has your organization considered using a data lake? This article explains what a data lake is, and how you can fish its murky depths\",\"breadcrumb\":{\"@id\":\"https:\/\/beta.iri.com\/blog\/business-intelligence\/the-use-of-data-lakes\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/beta.iri.com\/blog\/business-intelligence\/the-use-of-data-lakes\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/beta.iri.com\/blog\/business-intelligence\/the-use-of-data-lakes\/#primaryimage\",\"url\":\"https:\/\/beta.iri.com\/blog\/wp-content\/uploads\/2016\/05\/lake-image.jpg\",\"contentUrl\":\"https:\/\/beta.iri.com\/blog\/wp-content\/uploads\/2016\/05\/lake-image.jpg\",\"width\":960,\"height\":391},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/beta.iri.com\/blog\/business-intelligence\/the-use-of-data-lakes\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/beta.iri.com\/blog\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"The Use of Data Lakes\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/beta.iri.com\/blog\/#website\",\"url\":\"https:\/\/beta.iri.com\/blog\/\",\"name\":\"IRI\",\"description\":\"Total Data Management Blog\",\"publisher\":{\"@id\":\"https:\/\/beta.iri.com\/blog\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/beta.iri.com\/blog\/?s={search_term_string}\"},\"query-input\":\"required name=search_term_string\"}],\"inLanguage\":\"en-US\"},{\"@type\":\"Organization\",\"@id\":\"https:\/\/beta.iri.com\/blog\/#organization\",\"name\":\"IRI\",\"url\":\"https:\/\/beta.iri.com\/blog\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/beta.iri.com\/blog\/#\/schema\/logo\/image\/\",\"url\":\"https:\/\/beta.iri.com\/blog\/wp-content\/uploads\/2019\/02\/iri-logo-total-data-management-small-1.png\",\"contentUrl\":\"https:\/\/beta.iri.com\/blog\/wp-content\/uploads\/2019\/02\/iri-logo-total-data-management-small-1.png\",\"width\":750,\"height\":206,\"caption\":\"IRI\"},\"image\":{\"@id\":\"https:\/\/beta.iri.com\/blog\/#\/schema\/logo\/image\/\"}},{\"@type\":\"Person\",\"@id\":\"https:\/\/beta.iri.com\/blog\/#\/schema\/person\/c60bc4ff5919427034376979fb2cc8df\",\"name\":\"Jason Koivu\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/beta.iri.com\/blog\/#\/schema\/person\/image\/\",\"url\":\"https:\/\/secure.gravatar.com\/avatar\/01e97234ff964558ca620a43a0506ef0?s=96&d=blank&r=g\",\"contentUrl\":\"https:\/\/secure.gravatar.com\/avatar\/01e97234ff964558ca620a43a0506ef0?s=96&d=blank&r=g\",\"caption\":\"Jason Koivu\"},\"url\":\"https:\/\/beta.iri.com\/blog\/author\/jasonk\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"The Use of Data Lakes - IRI","description":"Has your organization considered using a data lake? This article explains what a data lake is, and how you can fish its murky depths","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/beta.iri.com\/blog\/business-intelligence\/the-use-of-data-lakes\/","og_locale":"en_US","og_type":"article","og_title":"The Use of Data Lakes - IRI","og_description":"Has your organization considered using a data lake? This article explains what a data lake is, and how you can fish its murky depths","og_url":"https:\/\/beta.iri.com\/blog\/business-intelligence\/the-use-of-data-lakes\/","og_site_name":"IRI","article_published_time":"2016-05-17T19:43:01+00:00","article_modified_time":"2023-02-10T20:57:36+00:00","og_image":[{"width":960,"height":391,"url":"https:\/\/beta.iri.com\/blog\/wp-content\/uploads\/2016\/05\/lake-image.jpg","type":"image\/jpeg"}],"author":"Jason Koivu","twitter_card":"summary_large_image","twitter_misc":{"Written by":"Jason Koivu","Est. reading time":"9 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/beta.iri.com\/blog\/business-intelligence\/the-use-of-data-lakes\/#article","isPartOf":{"@id":"https:\/\/beta.iri.com\/blog\/business-intelligence\/the-use-of-data-lakes\/"},"author":{"name":"Jason Koivu","@id":"https:\/\/beta.iri.com\/blog\/#\/schema\/person\/c60bc4ff5919427034376979fb2cc8df"},"headline":"The Use of Data Lakes","datePublished":"2016-05-17T19:43:01+00:00","dateModified":"2023-02-10T20:57:36+00:00","mainEntityOfPage":{"@id":"https:\/\/beta.iri.com\/blog\/business-intelligence\/the-use-of-data-lakes\/"},"wordCount":1920,"commentCount":0,"publisher":{"@id":"https:\/\/beta.iri.com\/blog\/#organization"},"image":{"@id":"https:\/\/beta.iri.com\/blog\/business-intelligence\/the-use-of-data-lakes\/#primaryimage"},"thumbnailUrl":"https:\/\/beta.iri.com\/blog\/wp-content\/uploads\/2016\/05\/lake-image.jpg","keywords":["business intelligence","Cambridge Semantics","Dan Lindstedt","data lake","data mart","Data Vault","data warehouse","EDW","Gartner","hadoop","IRI Voracity","James Dixon","LDW","metadata management","ODBC","ODS","pentaho","Sean Martin"],"articleSection":["Big Data","Business Intelligence (BI&#041;","Data Quality (DQ&#041;","ETL","IRI Business"],"inLanguage":"en-US","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/beta.iri.com\/blog\/business-intelligence\/the-use-of-data-lakes\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/beta.iri.com\/blog\/business-intelligence\/the-use-of-data-lakes\/","url":"https:\/\/beta.iri.com\/blog\/business-intelligence\/the-use-of-data-lakes\/","name":"The Use of Data Lakes - IRI","isPartOf":{"@id":"https:\/\/beta.iri.com\/blog\/#website"},"primaryImageOfPage":{"@id":"https:\/\/beta.iri.com\/blog\/business-intelligence\/the-use-of-data-lakes\/#primaryimage"},"image":{"@id":"https:\/\/beta.iri.com\/blog\/business-intelligence\/the-use-of-data-lakes\/#primaryimage"},"thumbnailUrl":"https:\/\/beta.iri.com\/blog\/wp-content\/uploads\/2016\/05\/lake-image.jpg","datePublished":"2016-05-17T19:43:01+00:00","dateModified":"2023-02-10T20:57:36+00:00","description":"Has your organization considered using a data lake? This article explains what a data lake is, and how you can fish its murky depths","breadcrumb":{"@id":"https:\/\/beta.iri.com\/blog\/business-intelligence\/the-use-of-data-lakes\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/beta.iri.com\/blog\/business-intelligence\/the-use-of-data-lakes\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/beta.iri.com\/blog\/business-intelligence\/the-use-of-data-lakes\/#primaryimage","url":"https:\/\/beta.iri.com\/blog\/wp-content\/uploads\/2016\/05\/lake-image.jpg","contentUrl":"https:\/\/beta.iri.com\/blog\/wp-content\/uploads\/2016\/05\/lake-image.jpg","width":960,"height":391},{"@type":"BreadcrumbList","@id":"https:\/\/beta.iri.com\/blog\/business-intelligence\/the-use-of-data-lakes\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/beta.iri.com\/blog\/"},{"@type":"ListItem","position":2,"name":"The Use of Data Lakes"}]},{"@type":"WebSite","@id":"https:\/\/beta.iri.com\/blog\/#website","url":"https:\/\/beta.iri.com\/blog\/","name":"IRI","description":"Total Data Management Blog","publisher":{"@id":"https:\/\/beta.iri.com\/blog\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/beta.iri.com\/blog\/?s={search_term_string}"},"query-input":"required name=search_term_string"}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/beta.iri.com\/blog\/#organization","name":"IRI","url":"https:\/\/beta.iri.com\/blog\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/beta.iri.com\/blog\/#\/schema\/logo\/image\/","url":"https:\/\/beta.iri.com\/blog\/wp-content\/uploads\/2019\/02\/iri-logo-total-data-management-small-1.png","contentUrl":"https:\/\/beta.iri.com\/blog\/wp-content\/uploads\/2019\/02\/iri-logo-total-data-management-small-1.png","width":750,"height":206,"caption":"IRI"},"image":{"@id":"https:\/\/beta.iri.com\/blog\/#\/schema\/logo\/image\/"}},{"@type":"Person","@id":"https:\/\/beta.iri.com\/blog\/#\/schema\/person\/c60bc4ff5919427034376979fb2cc8df","name":"Jason Koivu","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/beta.iri.com\/blog\/#\/schema\/person\/image\/","url":"https:\/\/secure.gravatar.com\/avatar\/01e97234ff964558ca620a43a0506ef0?s=96&d=blank&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/01e97234ff964558ca620a43a0506ef0?s=96&d=blank&r=g","caption":"Jason Koivu"},"url":"https:\/\/beta.iri.com\/blog\/author\/jasonk\/"}]}},"jetpack_featured_media_url":"https:\/\/beta.iri.com\/blog\/wp-content\/uploads\/2016\/05\/lake-image.jpg","_links":{"self":[{"href":"https:\/\/beta.iri.com\/blog\/wp-json\/wp\/v2\/posts\/9584"}],"collection":[{"href":"https:\/\/beta.iri.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/beta.iri.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/beta.iri.com\/blog\/wp-json\/wp\/v2\/users\/5"}],"replies":[{"embeddable":true,"href":"https:\/\/beta.iri.com\/blog\/wp-json\/wp\/v2\/comments?post=9584"}],"version-history":[{"count":42,"href":"https:\/\/beta.iri.com\/blog\/wp-json\/wp\/v2\/posts\/9584\/revisions"}],"predecessor-version":[{"id":16429,"href":"https:\/\/beta.iri.com\/blog\/wp-json\/wp\/v2\/posts\/9584\/revisions\/16429"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/beta.iri.com\/blog\/wp-json\/wp\/v2\/media\/13150"}],"wp:attachment":[{"href":"https:\/\/beta.iri.com\/blog\/wp-json\/wp\/v2\/media?parent=9584"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/beta.iri.com\/blog\/wp-json\/wp\/v2\/categories?post=9584"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/beta.iri.com\/blog\/wp-json\/wp\/v2\/tags?post=9584"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}