Business Intelligent

kettle
R

MDX: https://en.wikipedia.org/wiki/MultiDimensional_eXpressions
YANG data modeler

https://www.quora.com/What-algorithms-and-techniques-20-to-50-are-must-knows-for-a-junior-data-scientist
https://qz.com/1126615/the-story-of-the-most-important-tool-in-data-science/
http://www.sqlchick.com/entries/2013/12/5/getting-started-with-a-career-in-bi.html
https://joshuafennessy.com/2014/04/15/breaking-into-bi-where-to-begin/
http://www.b-eye-network.com/view/17084
https://www.nten.org/article/how-and-where-to-get-started-with-bi/
https://www.birst.com/wp-content/uploads/2015/04/gettingstarteddec13-2.pdf
https://www.quora.com/How-should-one-begin-career-in-data-mining-business-intelligence
http://www.compellingbi.com/t/how-to-build-a-successful-business-intelligence-career/105
https://stackoverflow.com/questions/139988/advice-for-someone-who-wants-to-start-in-business-intelligence

https://hackernoon.com/the-great-capacity-of-statistics-collections-july-edition-of-2017-5a0cca7f51c9
https://dzone.com/articles/fast-cars-big-data-how-streaming-data-can-help-for
https://dzone.com/storage/assets/5270923-dzone-rc160-datawarehousing.pdf
https://dzone.com/articles/creating-your-first-time-series-model-with-the-das
https://www.sitepoint.com/google-data-studio-a-nifty-free-easy-to-use-data-vis-tool
http://www.appcues.com/blog/redefining-pirate-metrics
Data warehousing and best practices for data warehousing
Data migration and best practices for data migration

https://drill.apache.org/
https://www.periscopedata.com/

http://www.7wdata.be/data-management/22-data-experts-reveal-the-top-techniques-to-double-the-effectiveness-of-your-big-data-analysis-efforts/
http://www.docurated.com/all-things-productivity/50-best-business-intelligence-tools
https://techcrunch.com/2012/07/18/twitter-storm-nodeable-pivot/
http://www.7wdata.be/human-resources/3-mistakes-managers-make-with-data-analytics

http://www.localytics.com/
https://www.kissmetrics.com/
https://www.kissmetrics.com/customers/buffer/
https://www.domo.com/solution/excel-reporting-dashboard
http://www.lancetdatasciences.com/resource-center/blog/microstrategy-tech-tip/tech-tip-how-create-microstrategy-intelligent-cubes
BIRT Actuate
BIRT
Business Object
MicroStrategy
Hyperion
Tableau
Qliktech
Brio
Cognos BI
Informatica
DataStage
Information Builders
Crystal Reports
SAP
SAS
Oracle
Microsoft
Hyperion Solution's Essbase
http://en.wikipedia.org/wiki/Oracle_Hyperion
http://en.wikipedia.org/wiki/Cognos
http://en.wikipedia.org/wiki/BusinessObjects
http://en.wikipedia.org/wiki/SAP_SE
http://en.wikipedia.org/wiki/Applix
http://en.wikipedia.org/wiki/Infor
Oracle's Express Server
Microsoft Analysis Service

Public data sets

Data Analysis
Data Mining
KPI
https://www.linkedin.com/pulse/3-cardinal-sins-kpis-performance-metrics-bernard-marr
statistical analysis
forecasting
decision support

https://www.youtube.com/watch?v=Sl_iviSSzXU
https://www.youtube.com/watch?v=0CBgt9eFCZE
https://www.youtube.com/watch?v=wdJw1GzXVro
https://www.youtube.com/watch?v=0NZv4ZkvTW8

Schemas design - Stars versus Snowflakes
Dimension tables
Fact tables

Ralph Kimball and Margy Ross, The Data Warehouse Toolkit: The Complete Guide to Dimensional Modeling (Second Edition), p. 393
Ralph Kimball and Bill Inmon

http://datawarehouse4u.info/OLTP-vs-OLAP.html - done reading
http://searchdatamanagement.techtarget.com/definition/OLAP - done reading
http://en.wikipedia.org/wiki/Online_analytical_processing - done reading
http://en.wikipedia.org/wiki/Fact_table - done reading
http://en.wikipedia.org/wiki/Dimension_%28data_warehouse%29 - done reading
http://en.wikipedia.org/wiki/Dimension_table - done reading
http://en.wikipedia.org/wiki/Star_schema - done reading
http://www.dwhworld.com/datawarehouse-and-business-intelligence/ - done reading
http://www.dwhworld.com/dwh-architecture/ - done reading
http://www.dwhworld.com/dwh-schemas/ - done reading
http://www.dwhworld.com/dwh-dimensions/ - done reading
http://www.dwhworld.com/dwh-facts/ - done reading
http://www.dwhworld.com/dwh-indexes/ - done reading
http://www.dwhworld.com/2010/11/date-dimension-sql-scripts-oracle/ - done reading
http://docs.oracle.com/cd/B10501_01/server.920/a96520/indexes.htm#745 - done reading
http://ciobriefings.com/Publications/WhitePapers/DesigningtheStarSchemaDatabase/tabid/101/Default.aspx
http://c2.com/ppr/stars.html
http://datawarehouse4u.info/Data-warehouse-schema-architecture-fact-constellation-schema.html
http://www.b-eye-network.com/view/8451
http://en.wikipedia.org/wiki/Reverse_star_schema
http://en.wikipedia.org/wiki/Snowflake_schema
http://en.wikipedia.org/wiki/Fact_constellation
http://en.wikipedia.org/wiki/Multidimensional_Expressions
http://en.wikipedia.org/wiki/XML_for_Analysis
http://www.b-eye-network.com/view/8451
http://datawarehouse4u.info/Data-warehouse-schema-architecture-fact-constellation-schema.html
http://www.dwoptimize.com/2007/06/aiming-for-stars.html
http://c2.com/ppr/stars.html
http://ciobriefings.com/Publications/WhitePapers/DesigningtheStarSchemaDatabase/tabid/101/Default.aspx
http://en.wikipedia.org/wiki/LINQ
http://en.wikipedia.org/wiki/Microsoft_Analysis_Services
http://en.wikipedia.org/wiki/Data_warehouse
http://en.wikipedia.org/wiki/Business_analytics
http://en.wikipedia.org/wiki/Predictive_analytics
http://en.wikipedia.org/wiki/Data_Mining
http://technet.microsoft.com/en-us/library/cc966470.aspx - Data Warehousing Framework
http://www.bi-verdict.com/fileadmin/dl_temp/de82b279d7ed5d26b13a526e91bd8c8f/DatabaseExplosion.htm
http://olap.com/w/index.php/Example_1

http://en.wikipedia.org/wiki/Data_analysis
http://www.sciencebuddies.org/science-fair-projects/project_data_analysis.shtml
http://www.linkedin.com/skills/skill/Data_Analysis
http://www.robertniles.com/stats/dataanly.shtml
http://www.perceptualedge.com/blog/?p=606
http://www.journals.elsevier.com/computational-statistics-and-data-analysis/
http://www.iospress.nl/journal/intelligent-data-analysis/
http://datajournalism.stanford.edu/
http://selection.datavisualization.ch/
https://digitalresearchtools.pbworks.com/w/page/17801661/Data%20Visualization
http://www.computerworld.com/s/article/9215504/22_free_tools_for_data_visualization_and_analysis

http://www.youtube.com/watch?v=CA-InkV0CTM&list=PL532F8048B644B707
http://www.youtube.com/watch?v=HXAstVP3-y0&list=PLE64EFC7336E15AD4

http://www.youtube.com/watch?v=G18YECWHlaw - DB Modelling: OLAP Database Design Part 1
http://www.youtube.com/watch?v=2ryG3Jy6eIY - What is OLAP
http://www.youtube.com/watch?v=MhhCEGVBMxg - Bussiness Intelligence EP2 / Part2
http://www.youtube.com/watch?v=ddkIpU9OINg - BI Tool Vendor Grudge Match Presentation Intro (1 of 2)
http://www.youtube.com/watch?v=qe39vPFabuA - What's a Data Warehouse and How Do I Test It?

http://www.youtube.com/watch?v=cSXWTNYn3es - Data Warehousing Explained, Illustrated, and Subtitled (Starring Sakila)
http://www.youtube.com/watch?v=mgEugd5kZgk - Demystifying the Data Warehouse
http://www.youtube.com/watch?v=K_FCHYWGGug&feature=BFa&list=UUrR22MmDd5-cKP2jTVKpBcQ
http://www.youtube.com/watch?v=r0Z_rOGFv7Q&feature=results_main&playnext=1&list=PLB88869FBD2EBBF4A
http://www.youtube.com/watch?v=LFnewuBsYiY&list=UUrR22MmDd5-cKP2jTVKpBcQ
http://www.youtube.com/watch?v=-j5J7lXav7Y - Business Intelligence Demonstration

http://www.youtube.com/watch?v=iSXNfZESR5I
http://www.youtube.com/watch?v=TaxJwC_MP9Q
http://www.youtube.com/watch?v=9LnpFCrX2uc
http://www.youtube.com/watch?v=qdDHp29QVdw
http://www.youtube.com/watch?v=7nL6F1wEB68
http://www.youtube.com/watch?v=8vfYo5WHT94
http://www.youtube.com/watch?v=RTZHamC9WCI
http://www.youtube.com/watch?v=fEUw8igr1IY
http://www.youtube.com/watch?v=NXgS1ZQ-Uw0
http://www.youtube.com/watch?v=9UHIga7Juhk
http://www.youtube.com/watch?v=9gJV-LF2t10
http://www.youtube.com/watch?v=n3iANHusfcY
http://www.youtube.com/watch?v=-tWS0tN8sW0
http://www.youtube.com/watch?v=DAXIWd_LDgI
http://www.youtube.com/watch?v=072Q18nX91I
http://www.youtube.com/watch?v=YFC2KUmEebc
http://www.youtube.com/watch?v=SFF-yUoLdHg
http://www.youtube.com/watch?v=WERsvhCkHhU
http://www.youtube.com/watch?v=mK7qmZ2-wxc
http://www.youtube.com/watch?v=Ef1xvuMBOz4
http://www.youtube.com/watch?v=qLvLg-sqxKc
http://www.youtube.com/watch?v=yEI57SK0P0U
http://www.youtube.com/watch?v=hJKX_iXKBbU
http://www.youtube.com/watch?v=6b3-0jIrYZM
http://www.youtube.com/watch?v=X9YVSDWQokQ
http://www.youtube.com/watch?v=G_axOe2r6eY
http://www.youtube.com/watch?v=MzS83BGdWco
http://www.youtube.com/watch?v=KaJtT1K3GtI
http://www.youtube.com/watch?v=Ld1Pcxgu1pE
http://www.youtube.com/watch?v=Ht76Qt0eVe0
http://www.youtube.com/watch?v=FIbVs5GbBlQ
http://www.youtube.com/watch?v=L_0efNkdGMc
http://www.youtube.com/watch?v=N_hWaPRfjMg
http://www.youtube.com/watch?v=6jT6Rit_5EQ
http://www.youtube.com/watch?v=QEaOfTuveGg
http://www.youtube.com/watch?v=R4crrQnBifg
http://www.youtube.com/watch?v=Tw684qC97-0
http://www.youtube.com/watch?v=EO4EOBZ-t6w
http://www.youtube.com/watch?v=RHu5vgBZ1yQ
http://www.youtube.com/watch?v=rFWsstnG9J4
http://www.youtube.com/watch?v=yHpW1v97A3M
http://www.youtube.com/watch?v=1Lh1HlBUf8k
http://www.youtube.com/watch?v=yWexhOmkVKU
http://www.youtube.com/watch?v=zwtg2yOCdpM
http://www.youtube.com/watch?v=PNu0BzKvqAM
http://www.youtube.com/watch?v=5Lu1eTiX7qM
http://www.youtube.com/watch?v=o7iDkcpOr_g
http://www.youtube.com/watch?v=GBzoNgqF-gQ
http://www.youtube.com/watch?v=bRzOBGLCRbc
http://www.youtube.com/watch?v=0tuEEnL61HM
http://www.youtube.com/watch?v=UbLCEJ4sRFw
http://www.youtube.com/watch?v=QI8623HlYd4
http://www.youtube.com/watch?v=I1Z8J5JvKtY
http://www.youtube.com/watch?v=lqQ6VFd3Tnw
http://www.youtube.com/watch?v=QEaOfTuveGg
http://www.youtube.com/watch?v=R4crrQnBifg
http://www.youtube.com/watch?v=QBpguVZRVPo
http://www.youtube.com/watch?v=CA-InkV0CTM
http://www.youtube.com/watch?v=8egzxhgTV5Q
http://www.youtube.com/watch?v=t3fEGhE-HYA
http://www.youtube.com/watch?v=UiIjEzW3br8
http://www.youtube.com/watch?v=duHxpSTmwW0
http://www.youtube.com/watch?v=PAYOjnLICo4
http://www.youtube.com/watch?v=LPYw5p8J7i0
http://www.youtube.com/watch?v=_eWDR3P8Lhw
http://www.youtube.com/watch?v=qgGqaBAEy3Q
http://www.youtube.com/watch?v=cHZONQ2-x7I
http://www.youtube.com/watch?v=MxRMXhjXZos
http://www.youtube.com/watch?v=w26x-z-BdWQ
http://www.youtube.com/watch?v=__s45TTXxps
http://www.youtube.com/watch?v=Zd5dfooZWG4
http://www.youtube.com/watch?v=eHsErlPJWUU
http://www.youtube.com/watch?v=0unf-C-pBYE
http://www.youtube.com/watch?v=33L_EXLtJPE
http://www.youtube.com/watch?v=oGcZ7WVx6EI
http://www.youtube.com/watch?v=1mJlgets6ds
http://www.youtube.com/watch?v=aKs1Vgtb1do
http://www.youtube.com/watch?v=__s45TTXxps
http://www.youtube.com/watch?v=vYrWTDxoeGg
http://www.youtube.com/watch?v=pyac2Xm38qQ
http://www.youtube.com/watch?v=5DSahEbJ4KY
http://www.youtube.com/watch?v=kwt6XEh7U3g
http://www.youtube.com/watch?v=zRsMEl6PHhM
http://www.youtube.com/watch?v=TaxJwC_MP9Q
http://www.youtube.com/watch?v=YFC2KUmEebc
http://www.youtube.com/watch?v=1HAAF4UT75o
http://www.youtube.com/watch?v=qBcI9WakS2o
http://www.youtube.com/watch?v=iXCPJNT9ZOQ
http://www.youtube.com/watch?v=XzxGnF_eiNo
http://www.youtube.com/watch?v=FoKxzorQIhU
http://www.youtube.com/watch?v=N5i85v0ckzY
http://www.youtube.com/watch?v=xpuB9ydmBsM
http://www.youtube.com/watch?v=CzvgrcQhWGg
http://www.youtube.com/watch?v=l4a3e__QzoY
http://www.youtube.com/watch?v=fmZYH3rmqDQ
http://www.youtube.com/watch?v=-tWS0tN8sW0
http://www.youtube.com/watch?v=072Q18nX91I
http://www.youtube.com/watch?v=GGjsMVphGLU
http://www.youtube.com/watch?v=FO0fgVS9OmE
http://www.youtube.com/watch?v=bobeo5kFz1g
http://www.youtube.com/watch?v=s2oTUsAJfjI
http://www.youtube.com/watch?v=M1vMNu5j430
http://www.youtube.com/watch?v=LhFDAzER0NE
http://www.youtube.com/watch?v=OfZOS6vlD8I
http://www.youtube.com/watch?v=WOOTNBxbi8c
http://www.youtube.com/watch?v=Rm6s6gmLTdg
http://www.youtube.com/watch?v=sRktKszFmSk
http://www.youtube.com/watch?v=F-nfsSq42ow
http://www.youtube.com/watch?v=NCXcDje9uxA
http://www.youtube.com/watch?v=zRsMEl6PHhM
http://www.youtube.com/watch?v=mbyG85GZ0PI
http://www.youtube.com/watch?v=q0srNT_XM_Y
http://www.youtube.com/watch?v=aZ5ckLy6IdE
http://www.youtube.com/watch?v=vpD5UG-tFsY
http://www.youtube.com/watch?v=4ONBVNm3isI
http://www.youtube.com/watch?v=aGbMg3EOWuM
http://www.youtube.com/watch?v=3kYujfDgmNk
http://www.youtube.com/watch?v=T5ZjSFnOxys
http://www.youtube.com/watch?v=E1LwqtBdPYs

Star schema and Snowflake schema:
Data Warehousing Best Practices Star Schemas - watched
Big Data, OLAP, BI Tutorial - Stars, Wormholes, and Data Warehouse Design - watched, relevant
Data Modeling and Conceptual Sketching in the Design Process
The Theory of Normalization
DB Modelling: OLAP Database Design Part 1 - watched, relevant
Analysis Services - 11 Star and Snowflake Schemas
The Thinking Persons Guide to Data Warehouse Design
Data Modeling and Conceptual Sketching in the Design Process
Business Dimensional Modeling with Laura Reeves - BI Knowledge Exchange - Part 1
3NF - Data Vault - Star Schema by Data Vault Academy
INF 3: What is Star Schema
INF 4: What is Snowflake Schema
Data Warehouse Star Schema - Kalman Toth
Analysis Services - 11 Star and Snowflake Schemas
4 Snowflake dimensions, defining hierarchies, defining dimension primary keys
IBI10 - Creating Staging Areas
Slowly Changing Dimension Type 2 Illustration Using Informatica - By Mohan Vamsi

ETL:
Data Warehouse tutorial. Creating an ETL using MS Visual Studio
ETL Testing: How to Achieve 100% Data Validation
What's a Data Warehouse and How Do I Test It?
ETL Essential Fundamentals

Data Warehousing and Data Mining:
Introduction to Data Warehousing and Data Mining
Data Warehousing Part 1
Sakila: Data Warehousing Explained, Illustrated, and Subtitled
Data Mining: The Tool of The Information Age

Microsoft SQL Server:
Microsoft SQL Server Analysis Services tutorial. Creating OLAP cube. Introduction to data warehouse

Statistical Aspects of Data Mining (Stats 202) Day 1
Statistical Aspects of Data Mining (Stats 202) Day 2
Statistical Aspects of Data Mining (Stats 202) Day 4
Statistical Aspects of Data Mining (Stats 202) Day 3
Statistical Aspects of Data Mining (Stats 202) Day 5
Statistical Aspects of Data Mining (Stats 202) Day 6
Statistical Aspects of Data Mining (Stats 202) Day 7
Statistical Aspects of Data Mining (Stats 202) Day 8
Statistical Aspects of Data Mining (Stats 202) Day 9
Statistical Aspects of Data Mining (Stats 202) Day 10

O'Reilly Webcast: Social Network Analysis -- Finding communities and influencers
Analyzing Social Networks on Twitter
Social Network Analysis with Python
Lecture 14 - Analyzing Big Data with Twitter: Stan Nikolov on Information Diffusion at Twitter
Brains, Meaning and Corpus Statistics
Introduction to Predictive Analytics
R and Hadoop - Big Data Analytics
Using Segmentation and Predictive Analytics to Combat Attrition
Realtime Analytics for Big Data: A Facebook Case Study

MySQL Data Warehousing:
http://www.youtube.com/watch?v=U0jw4gkGt9M
http://www.youtube.com/watch?v=zdSYCwgE6nA
http://www.youtube.com/watch?v=cSXWTNYn3es
http://www.youtube.com/watch?v=s-KIpKjDMNI
http://www.youtube.com/watch?v=G_iaJ8TFwy8

Big Data:
http://www.youtube.com/watch?v=zY_brMqvhzk
http://www.youtube.com/watch?v=UiIjEzW3br8
http://www.youtube.com/watch?v=UbLCEJ4sRFw
http://www.youtube.com/watch?v=lbCmFZpMNxA
http://www.youtube.com/watch?v=8egzxhgTV5Q
http://www.youtube.com/watch?v=E1LwqtBdPYs
http://www.youtube.com/watch?v=b9UJ6W0jG1M
http://www.youtube.com/watch?v=ceeiUAmbfZk
Big Data - The Hadoop Data Warehouse - Part 1
Big Data tutorial by Marko Grobelnik
Analyzing Big Data with Twitter - Lec. 2 - Growing a Human-Scale Service & the Twitter Ecosystem
Big Data and Hadoop 1 | Hadoop Tutorials 1 |Big Data Tutorial 1 |Hadoop Tutorial for Beginners -1
Realtime Analytics for Big Data: A Facebook Case Study
Big Data & BI Best Practices Webinar by Yellowfin and Actian Vectorwise

Data Visualization:
http://www.youtube.com/watch?v=R-oiKt7bUU8

MongoDB:
http://www.youtube.com/watch?v=PIWVFUtBV1Q

NoSQL:
http://www.youtube.com/watch?v=PIWVFUtBV1Q

Informatica:
http://www.youtube.com/watch?v=gCsQbL3qT1g

Data Mining:
http://www.youtube.com/watch?v=tAMmciaSpvk
http://www.youtube.com/watch?v=DSQST2EaPnc

Data Mining using R:
http://www.youtube.com/watch?v=4V23ZQQfwYk
http://www.youtube.com/watch?v=CtCJFoSr1Ck
http://www.youtube.com/watch?v=HKjSKtVV6GU
http://www.youtube.com/watch?v=6jT6Rit_5EQ
http://www.youtube.com/watch?v=LjuXiBjxryQ
http://www.youtube.com/watch?v=s0B1x08Qpzw
http://www.youtube.com/watch?v=5zk93CpKYhg
http://www.youtube.com/watch?v=WJDrYUqNrHg
http://www.youtube.com/watch?v=rFWsstnG9J4
http://www.youtube.com/watch?v=VLtazaiYo-c
http://www.youtube.com/watch?v=yLF45x70A3o
http://www.youtube.com/watch?v=R5Z22gwnpCk
http://www.youtube.com/watch?v=l430jJ3guTI
http://www.youtube.com/watch?v=62EY_9TtbUM
http://www.youtube.com/watch?v=6jT6Rit_5EQ
http://www.youtube.com/watch?v=IiVq8M5DBkk
http://www.youtube.com/watch?v=qHfSTRNg6jE
http://www.youtube.com/watch?v=AipnE4s8sKk&list=PL224DB930CC1F4F90
http://www.youtube.com/watch?v=viPRny0nq3o
http://www.youtube.com/watch?v=QEaOfTuveGg
http://www.youtube.com/watch?v=yWexhOmkVKU
http://www.youtube.com/watch?v=OmlX3IHb0JE
http://www.youtube.com/watch?v=4V23ZQQfwYk
http://www.youtube.com/watch?v=1HAAF4UT75o
http://www.youtube.com/watch?v=VE-Os0Nujk8
http://www.youtube.com/watch?v=qBcI9WakS2o
http://www.youtube.com/watch?v=CtCJFoSr1Ck
http://www.youtube.com/watch?v=l430jJ3guTI
http://www.youtube.com/watch?v=rFWsstnG9J4
http://www.youtube.com/watch?v=5TOQATLbT6I
http://www.youtube.com/watch?v=qHfSTRNg6jE
http://www.youtube.com/watch?v=AipnE4s8sKk&list=PL224DB930CC1F4F90
http://www.youtube.com/watch?v=6jT6Rit_5EQ
http://www.youtube.com/watch?v=TaxJwC_MP9Q
http://www.youtube.com/watch?v=QEaOfTuveGg
http://www.youtube.com/watch?v=K90ZLYpfGsY

Statistical Finance:
Issues in Financial Mathematics and Statistics
http://www.youtube.com/watch?v=bIiQRRllJoA
http://www.youtube.com/watch?v=vTs2IQ8OefQ
http://www.youtube.com/watch?v=WMkD8HKJQCM
http://www.youtube.com/watch?v=D3aHciiVdvQ
http://www.youtube.com/watch?v=DhX0PGG-baI
http://www.youtube.com/watch?v=0ZLNbxWH8Lc
http://www.youtube.com/watch?v=FfLL0ItNOv8
http://www.youtube.com/watch?v=QJzrnCc4r3U
http://www.youtube.com/watch?v=NvJARkFjm1Q

Math:
http://www.youtube.com/watch?v=0AXeBcfKe_A

Books:
Head First Statistics
Head First Data Analysis

Data mining
Predictive analysis

The star schema data warehouse takes the data from many transactional system and copies the data to a common format with a completely different relational database design than a transactional system containing many star schema configurations.

Each star schema has a central fact table which represents events or occurrences which have measures that are always been numeric, therefore they can be counted, sum or average. Measures could include a count of the number of cases filed, the count of dispositioned cases, as in example in this demonstration or the fines assessed per case in analysis of criminal cases.

A single fact table can have many measures so long as they pertain to the same variable or dimensions. If important measures don't share all their dimensions, and many of them will not, we can make other fact tables in the star schema data warehouses, and relate those facts to their relevant dimensions.

An interesting feature of the star schema data warehouse is that end users intuitively understand it. They understand that they are looking at some related measures that can be filtered by relevant dimensions.

One challenge of designing in constructing a data warehouse is determining out of all the tables, rows, and columns available what are the measures which defined the relevant facts upon which to base our system projection.

Another challenge concerns how we address some of the existing problems that were previously identified with existing data.

Remember that one of the problems was the data was in databases from different vendors, each with their own schema. In the past, this translate to the common design of the data warehouse was done by writing low level database vendor and database content specific programs to do all the work. Remember that data was not quoted consistently, in that similar items may be coded using different courthouse terminology. The differences may be small and obvious, or large and subtle. There may be misspelling of key terms, or names may be presented differently. All of these issues affect the quality of the data and the effort necessary to transform the data into a star schema data warehouse.

Performing the task associated with moving, correcting, and transforming the data from transactional systems to star schema data warehouse is called extraction, transformation, and loading.

Even when the measures and dimensions are accurately specified, the shortcoming of a star schema data warehouse is that simple queries can only get one piece of data at a time. The intersection of a measure with the single value from each of the dimension. In order to make comparisons between many values in a useful way greater querying ability is needed.

Analysis services stores data in yet another format call the cube. The term cube is a convenience because usually there will be more than three dimensions or variables against which we wish to aggregate and analyzed the data. For illustration, let's consider a cube that have only three dimensions. It will be a cube that measures the count of dispositions of court case. Along one axis, we have the dimension of time. Along another we have county. And across the third, we have judges.

To keep the cube simple, we have a subset of the values that these dimensions might take on. The values to the dimension can take on are called members. If we set one member of each dimension to a fixed value, a single measure will result, but the cube is going to let the user view many of these cells at a time, which gives us a broader comparitive view of the data. Because the data is pre-aggregated, we will be able to adjust our querying parameters in real time without having to wait 30 minutes for a report generating tool to count all the detail records of a transactional database to populate the cells of the cube.

Let look at a cube through analysis services based on data from several tensest databases that have been placed in a single star schema data warehouse. This data is not actual data or actual data that has been modified during testing.

Data can be allocated along some dimensions, and merge across others.

The important area of the spreadsheet for use with analysis services is called a pivot table. There are places on the pivot table for placing measures and placing dimensions. Dimensions can also be placed to filter the entire view of the data. We will place our first measure in the central area of the pivot table. This cube currently has only one measure defined, but if multiple measures existed for this cube, we could display one or more of them at a time side by side. Here we see the total count dispositions contained in this cube. Virtually every cube has time as a dimension, so we will place time across the horizontal axis of the pivot table

What is a pivot table?

The user view the data through predefined rigid interface and have ability to only query in the manner that has been previously configured for them by programmers. If we perform an analysis to discover the key data column that comprise measures and dimensions of all the relevant databases and database types, we can extract this data, massage it into a common format, and loaded into star schema relational data warehouse. While the design of these data warehouse is often understood by end-users, this presentation of the data has its limitation as well. If there are derived and calculated measures and dimensions such as the year to year change calculations shown in the excel demonstration, formulas for these and for calculations specific to the subject matter of the cube must be defined. The data is in process from the star schema data warehouse, and loaded into the analysis services cubes. Users are then ready to use the data through excel, or web pages using excel-like features.

BI: Getting the right information into the right people's hands in a format that allows them to understand the data quickly. It is not just about decision support. BI is about monitoring changes (identifying lagging / growing metrics, understandy why, understanding the context behind the numbers, identifying trends in various areas of your organization). React to changes, and cause changes.

What is data cube?

What is dimension?

What is measure?

A data warehouse is a database that collects, integrates, and stores an organization's data with the aim to produce accurate timely information, provide a mean for analysis of data to support management decisions, and decision support.

Analysis often requires huge amount of data to be processed, which is often a problem. For example, the OLTP database tables are locked for retrieval. A data warehouse can be completely detached from the information system, even running on a different system.

OLTP system's data model is rarely optimized for analysis. We all learn to develop systems to use normalized database modeled after our entity relationship. This is a good thing for information system, but it makes querying for large sums of aggregated data a costly operation. Furthermore, redundancy is rarely part of this database design, because redundancy is hard to maintain, often causing data inconsistencies or worse. For data analysis, redundancy can be great, because it speeds up the process.

Data in an OLTP system might change over time. A customer might move to another country, leaving you, the data analyst with an impossible choice: either you update the customer's record, discarding his previous state and invalidating previous analysis or you need to somehow create a new record for the on-line system and change all the reference to it. Neither of them are desirable, but in a data warehouse, you can keep both the old and new state of the customer and specify at what period in time it has changed.

Each data source consist of a single fact table links with multiple dimensional tables.

As we create and link dimensions, an important rule is to never use the existing keys from the online system, because we have no control over how they might change or even disappear. Instead, every dimension will get it's own surrogate key called the "technical key" which is unique to the data warehouse. This also goes for the fact table.

The first step in constructing a data warehouse is to populate the data warehouse with data from the OLTP system. This step is known as ETL (Extract, Transform, Load). Extract the data needed for the fact and dimension tables from all different data sources, transform it to fit our needs, and load it into the data warehouse so it can be queried.

In order to be able to fill the central fact table, the keys to all the dimensions must be known.

Two types of dimensions (not to be confused with Ralph Kimball's slowly changing dimension types):

  1. Dimensions consisting of data already known to the online system (the data exists in another table in the online system)
  2. Dimensions that are to be generated from the fact data and surrounding sources

We generate or update the dimensions of type two (dimensions that are to be generated from the fact data and surrounding sources) while filling out fact table, and thus know its keys at that time. However, we cannot do this for the dimensions of the first type.

In our example, the time and page dimensions are of the second type. They are generated when updating our fact table. The user dimension has to be known before then, because it is based on some independent tables in the online system. Because of this, we will fill our data warehouse in two separate transactions, ensuring first the existence of our type 1 user dimension and later the other dimensions and the fact table itself.

Updating type 1 dimensions:

Because the data exists in the source system as a separate table, all we have to do is read it and run it through Spoon's "Dimension lookup / update" step. This step is capable of creating, updating, and looking up "slowly changing dimensions" as described by Ralph Kimball. For sources that contains changing data, such as the customer records mentioned earlier, it can do this in two ways:

  • Overwriting the existing record by the updated one
  • Creating another dimension record, maintaining multiple copies of changed records over time

The second type is implemented by adding a "version", "date_from" and "date_to" field to the dimension's table, and it is almost always the most useful one.

Because we create this dimension directly from a table from the online system, the key field to use in this dimension is trivial. It is the key used in the on-line system, which in our case is the user's email. We will not use this field as a key in our data warehouse. We replace it by a technical key unique to our data warehouse. It is merely needed to be able to retrieve this technical key later, as we will need it to link the dimension to the fact table. In our case, this technical key field is called "user_id".

In the "Fields" tab, we specify the fields that are important to this dimension, such as hierarchical data. In our case, we keep track of information like the user's full name, the company he works for, his / her gender.

In case of type two dimensions, the "Version field" is used to keep track of the different versions in a slowly changing dimension. The date ranges will be used to indicate the period of validity for each version in that case. If the online system keeps track of when records change, it is useful to use the "Stream Datefield" for better validity, but there are many more applications possible, which are not in the scope of this document. Usually, the default will suffice.

Updating type two dimensions and the fact table:

Now that all independent dimensions have been prepared, it is time to populate the fact table. In this process, the remaining dimensions can be updated as well, having their technical keys ready when needed.

We start by reading the basic grain for the fact table from the request log. Then the date dimension is generated, which is done by using the JavaScript step on the timestamp field in the request log. Using the "Dimension Lookup / Update" step, we put this data into our data warehouse, and keep reference to it in the form of the generated technical key "time_id", which is added to the stream as an extra column.

Next up is the user dimension. Remember that we already updated this in the previous step. Now, all we need to do is link its corresponding entry for each request in the fact table. Through a complex lookup using the session key to match each request to a login action from the actions table, we are able to finally match a user to each request. If no user can be found for this session, we turn its key (the email) to NULL, so it will match a special case created by Spoon's "Dimension Lookup / Update" step: the unknown user.

Because filtering creates separate threads, we sort the user streams afterwards using Spoon's sort step. When we reached the "Lookup user_dim" step, the stream contains an extra field named "email", which contains the email of each request, or NULL if it is not known. Remember that we specified "email" as the lookup key when updating this dimension earlier, so using this field, the transformation step can find the technical key for each entry in the stream, adding it as an extra field called "user_id".

While filling the dimension table with hierarchical data about this request (domain, path, and page), the newly generated technical key field "page_id" is added to the stream.

Finally, the data that is to go into the fact table is filtered so that only the technical keys to the dimensions and the grain's facts remain. It is then inserted into a table request_fact.

Next, we need to tell our OLAP server (Mondrian) about our data warehouse's structure. This is done using a fairly well-documented XML format, containing information about how the OLAP cubes, their dimensions, hierarchies, and measures are to be built.

In our case, the schema contains only one cube, which is called "Requests". The request cube is linked directly to the "request_facts" table. It has one measure, and four dimensions.

ROLAP requires a properly prepared data source in the form of a star or snowflake schema that defines a logical multi-dimensional database and maps it to a physical database model. Once we have our initial data structure in place, we must design a descriptive layer for it in the form of a Mondrian schema, which consists of one or more cubes, hierarchies, and members.

We don't have to worry too much about getting the model exactly right on the first try. Just cover all of your anticipated business needs. Part of the process is coming back to the data warehouse design step and making changes to your initial data model after you've discover your needs. What about implications on every time we redo / improve / or make change on our data warehouse? Do we have to tear down and set up from scratch? Do we have to clean up? What does this imply on our existing operations / services?

  1. Design a star or snowflake schema. The entire process starts with a data warehouse.
  2. Populate the star / snowflake schema. After the data model is designed, the next step is to populate it with actual data, thereby creating your data warehouse.
  3. Build a Mondrian Schema. Now that our initial data warehouse is complete, we must build a Mondrian schema to organize and describe it in term that Pentaho Analysis can understand.

Do all business intelligence applications require a data warehouse?

No. Not all data warehouses are used for business intelligence nor do all business intelligence applications require a data warehouse.

What are the layers in a data warehouse architecture?

  1. OLTP
  2. ETL
  3. Metadata: There are dictionaries for the entire warehouse and sometimes dictionaries for the data that can be accessed by a particular reporting and analysis tool.
  4. Informational access layer: The data accessed for reporting and analyzing and the tools for reporting and analyzing data – Business intelligence tools fall into this layer.

What are the three main approach for designing a data warehouse?

Famous authors and data warehouse experts Ralph Kimball and Bill Inmon give two different design methodoloigies for building a data warehouse.

  1. Kimball’s approach is more of a Bottom-up design where data marts are created first for specific subject/business areas and have the capability to report and analyse. THen these data marts are combined to create a data warehouse. This approach provide quicker approach to get the data ready for individual sujects/businesses. The major task in this design is maintaining the Dimensions across multiple data marts.
  2. Inmon has defined a data warehouse as a centralized repository for the entire enterprise, in which the data warehouse is designed using a normalized enterprise data model. Data at the lowest level of detail is stored in the data warehouse. Dimensional data marts containing data needed for specific business processes or specific departments are created from the data warehouse. Inmon states that the data warehouse is: Subject-oriented, Non-volatile and Integrated. This methodology generates highly consistent dimensional views of data across data marts since all data marts are loaded from the centralized repository. Top-down design has also proven to be robust against business changes. Generating new dimensional data marts against the data stored in the data warehouse is a relatively simple task. The main disadvantage to the top-down methodology is that it represents a very large project with a very broad scope. The up-front cost for implementing a data warehouse using the top-down methodology is significant, and the duration of time from the start of project to the point that end users experience initial benefits can be substantial. In addition, the top-down methodology can be inflexible and unresponsive to changing departmental needs during the implementation phases.
  3. Hybrid: combine these two and provide more comprehensive and robust data warehouse design.

See http://www.dwhworld.com/dwh-architecture/

Why are the schemas in a data warehouse typically designed at a level less then third normal form?

The star and snowflake schema are most commonly found in dimensional data warehouses and data marts where speed of data retrieval is more important than the efficiency of data manipulations. As such, the tables in these schema are not normalized much, and are frequently designed at a level of normalization short of third normal form. See http://www.dwhworld.com/dwh-schemas/

Unless otherwise stated, the content of this page is licensed under Creative Commons Attribution-ShareAlike 3.0 License