ETL vs ELT: We Posit, You Judge
Full disclosure: As this article is authored by an ETL-centric company with its strong suit in manipulating big data outside of databases, what follows will not seem objective to many. Read More
Full disclosure: As this article is authored by an ETL-centric company with its strong suit in manipulating big data outside of databases, what follows will not seem objective to many. Read More
Data profiling, or data discovery, refers to the process of obtaining information from, and descriptive statistics about, various sources of data. The purpose of data profiling is to get a better understanding of the content of data, as well as its structure, relationships, and current levels of accuracy and integrity. Read More
IRI’s data management tools share a familiar and self-documenting metadata language called SortCL. All these tools — including CoSort, FieldShield, NextForm, and RowGen — require data definition file (DDF) layouts with /FIELD specifications for each data source so you can map your data and manage your metadata. Read More
One of the best ways to speed up big data processing operations is to not process so much data in the first place; i.e. to eliminate unnecessary data ahead of time. Read More
Big data integration activities can happen outside the database in an extract, transform, load (ETL) environment, or inside the database in ELT:
http://www.iri.com/blog/data-transformation2/etl-vs-elt-we-posit-you-judge/
One example of an ELT operation would be Informatica’s Pushdown Optimization option, in which users transform data in a relational database like Oracle, or in Teradata. Read More
The IRI data management platform Voracity, as well as its constituent tools, can perform and speed big data warehouse extract, transform, load (ETL) operations, delaying the need for new hardware or expensive proprietary appliances: http://www.iri.com/blog/data-transformation2/a-big-data-quandary-hardware-or-software-appliances-or-cosort/ Read More
In 1992, Digital Equipment Corporation (DEC, long since acquired) asked IRI to develop a 4GL interface to CoSort in the syntax of the VAX VMS sort/merge utility. Read More
This demonstration shows how to set up a sort job for CoSort using the IRI Workbench. The sort is accomplished using the SortCL language. This video takes a CSV input file, shows how to define the sort keys and options, and demonstrates how to define the targets for output. Read More
This demonstration shows how to use the IRI Workbench to create an aggregation job using sums. Workbench is used to create the job script in the SortCL language. Read More
The sort included with each Unix-based operating system is a standard command line program that prints lines of input or specified input files in the specified sorted order. Read More
Big Data Problem Big data volumes are growing exponentially. This phenomenon has been happening for years, but its pace has accelerated dramatically since 2012. Check out this blog entitled Big Data Just Beginning to Explode from CSC for a similar viewpoint on the emergence of big data and the challenges related thereto. Read More