Quantcast
Channel: MicroStrategy – Michael Sandberg's Data Visualization Blog
Viewing all 23 articles
Browse latest View live

MicroStrategy World Europe 2013 – CEO Michael J. Saylor Keynote Video

$
0
0

This past week, MicroStrategy World Europe 2013 was held in Barcelona, Spain. As usual, Michael Saylor, Founder and CEO of MicroStrategy, Inc. provided the keynote address of the conference.

I have listened to Michael’s presentation today and want to think about it a bit before I post some commentary.

In the meantime, here is a link to the video of his presentation. Just click on the image below.

Regards,

Michael

[Click image to watch video presentation]

[Click image to watch video presentation]


Filed under: Business Intelligence, Michael J. Saylor, MicroStrategy, MicroStrategy World, Mobile Technology

MicroStrat​egy Makes Debut in Gartner Mobile App Developmen​t Platform Magic Quadrant

$
0
0

MicroStrategy Mobile

Gartner has just published its 2013 Magic Quadrant for Mobile Application Development Platforms. This year, MicroStrategy made its debut in the report, being positioned by Gartner in the Challengers quadrant.

Get free access to the report, compliments of MicroStrategy, here.

For six consecutive years, Gartner has positioned MicroStrategy’s Analytics Platform in Gartner’s Magic Quadrant for Business Intelligence. “MicroStrategy Mobile is now independently recognized for the value it offers customers searching specifically for a mobile app platform,” says MicroStrategy President, Paul Zolfaghari.  “MicroStrategy Mobile brings to life analytics, transactions, workflows and multimedia in platform-built custom native apps for tablets and smartphones. As a result, we have seen hundreds of our customers use our mobile platform to build apps that reach beyond BI and include enterprise workflow and transactional processing.”

Here is the actual Magic Quadrant and synopsis of MicroStrategy discussed in detail in the Gartner report.

Gartner Mobile Magic Quadrant

MicroStrategy

Widely known as a business intelligence vendor, MicroStrategy offers a comprehensive MADP supporting the creation, deployment and measurement of transactional, as well as information-centric, applications. Cross-platform development is performed using a visual tool to compose and configure UI and functional elements, and to tune the layout for the necessary screen sizes and form factors. Client-side functionality is delivered through a MicroStrategy-developed container, leveraging native widgets on each of the supported platforms. The result is a cross-platform application that has a native user experience. In addition to MicroStrategy’s analytics server, a mobility server is provided to support user management, security, data storage, back-end integration and dynamic provisioning of applications to the client devices. Somewhat unusual for a vendor at the enterprise end of the MADP landscape, MicroStrategy offers a hands-on approach to platform evaluation through a downloadable free trial, as well as a free perpetual license for up to 10 users. It also offers a paid two-week QuickStrike program that includes strategic guidance, professional graphics design, development support and user testing.

Appropriate Use: MicroStrategy is a good choice for mobile scenarios that span the life cycle of data creation, capture and analysis, or where the desire for reduced time to benefit or to support rapidly evolving business requirements outweighs the need for standards-based development.

Strengths

  • MicroStrategy’s platform and customer engagement model emphasizes the creation of polished UIs, resulting in business applications that have visual appeal in customer-facing business scenarios, such as side-by-side selling.
  • The platform’s tooling and metadata-driven architecture lend themselves to RAD and iteration, while supporting enterprise requirements for security and reliability.
  • MicroStrategy’s advanced data analytics and mobile dashboard capabilities can be used to monitor and measure application usage patterns, and to mine and analyze business data.
  • As with other platforms based on a managed-client/server architecture, data synchronization is provided to support offline functionality.

Cautions

  • While MicroStrategy’s mobility platform is suitable for transactional applications, the inclusion of MicroStrategy’s analytics server makes the platform primarily desirable for scenarios that also require data analytics.
  • The toolset’s visual approach to AD is not, as of yet, suited to scenarios that require significant customization.

Filed under: Gartner, Magic Quadrant, MicroStrategy, Mobile Technology

MicroStrategy “Data Science to Business Value” – Arpit Agrawal

$
0
0

Note: I received this blog posting in an e-mail. I generally and in principle agree with Arpit in his assessment of MicroStrategy. I thought I would share his thoughts and, if time permits, offer my thoughts in a future blog.

Best Regards,

Michael

MicroStrategy “Data Science to Business Value” by Arpit Agrawal (blogged on IT Central Station on August 14, 2013)

When a customer wants to implement BI, there are lots of options available in the market to choose from. The decision depends on various parameters such as  scale of implementation, costing and  performance. They can either go with the open source options, or they can plan towards a paid BI tool. The decision can  also be driven based on strategic factors of the company. I personally don’t believe in tool agnostic approach, I feel every customer has his own BI needs and has some specific requirements. No tool in this world can meet each one of these requirements, but in my experience MicroStrategy provides a complete product, which meets most of the customer requirements. This would be my first product review on a Business intelligence tool “MicroStrategy”

MicroStrategy is a solution provider for enterprise software platform on business intelligence. When we talk about MicroStrategy the first thought that comes into my mind is the core focus of the company, which is to deliver Business intelligence solutions, unlike other companies, which have their focus on different horizons. This, in my opinion, is the core strength of MicroStrategy as a company. Mike Saylor (CEO) gave similar insights in the MicroStrategy World conference. According to him the biggest advantage of MicroStrategy as a company is that they are the largest independently owned BI platform. There are a lot of companies in the market who acquire small firms to improve their technology and product, but MicroStrategy is not one of them. The fact that MicroStrategy has resisted the temptation to build a technology portfolio through acquisitions has become their core strength. They always believed in their core framework and everything was developed by their own team. According to the BI Survey 12, conducted by German-based Business Application Research Center (BARC), MicroStrategy has received the highest ranking in numerous KPIs (Key Performance Indicator) such as  total Performance, user Recommendation, mobile, query performance, data volumes and  big data.

MicroStrategy has a deep market penetration and is regularly used for industry level solutions in different domains such as  Retail, Healthcare, Banking, Manufacturing, Social Media etc. There are lots of key advantages of using MicroStrategy which I will elaborate verbosely.

  • Core functionality and Ease for Development - Having worked on different reporting platforms, I feel that most of the functionality is directly available out of the box in MicroStrategy. MicroStrategy platform supports distributed development so many developers can work on the project at a given time. Most of objects are reusable and their definition can be reused across other business areas. It also has extensive control on formatting of reports / dashboard. It supports as many number of databases and next generation database along with multi sourcing. The security architecture of MicroStrategy makes designing and implementation very powerful. You can have row level, object level, application level, and all different forms of security. Any developer with a fair amount of development experience in MicroStrategy would be able to list down product features and limitations. Although limitations of the tool are not much but there are functionalities which can be handled using SDK.
  • Visual Data discovery - For any BI tool, the ease-of-use is very important along with the sophisticated analytics which can be presented using the best visualization framework. As the size of data is increasing day by day, this is not going to decrease in the near future as well. It has become very important for a BI to analyze these enormous data volumes in a best representable format. Data visualization is driving demand in the business intelligence (BI) market because it’s intuitive and accessible to business users who aren’t schooled in query languages or statistical analysis. MicroStrategy has a strong stand in these areas due to large number of visualizations in their library. This separates MicroStrategy, as a tool, for visual exploration from other competitors. Features like flash widgets and interactive flash PDF export make it very versatile tool. The advance dashboards visualizations make the dashboard appealing and are best way to represent data in a more intuitive format.
  • Impromptu BI - There are customers in the market who look for Impromptu BI as a solution for their BI needs. MicroStrategy provides a solution called as visual insights which meets this purpose and is best suited for such cases. MicroStrategy Visual Insight empowers you to discover insights from your data using compelling visualizations. Users can use any of the out of the box Visualizations or they can create their own custom Visualizations using SDK. Once the base cube is developed it hardly takes 10-20 minutes to create visualizations for the analysis and this is what most business users want these days. They are the people who understand their data and want to play around with the data. Other than visual insights there are solutions developed using Object prompt where user has an option to select what they want to view in a report. This makes MicroStrategy stand out in front of other competitors.
  • Mentoring and support offering - For any business intelligence tool the level and depth of customer support is becoming increasingly critical as BI becomes more integrated into organizations’ operations. Better product support results in higher application success rates and helps to ensure customers get full value from their Investments. MicroStrategy on that note has one of the best customer support offering in the industry. This helps you to report your issues and business cases enhancements to MicroStrategy which can be taken into consideration for the subsequent release/hotfix/patch depending on your business impact. The support offering also helps in providing solutions to MicroStrategy developers and Administrators for the issues they face during the development. Striving to maximize the business benefits of a MicroStrategy one should prioritize customer support strategy. Apart from the excellent customer offering MicroStrategy hosts customer trainings events at different Geographic locations as it is important to keep decision makers across the company educated on the tool. End user training and mentoring is very important for success of any BI tool. MicroStrategy also hosts their annual user conference called as “MicroStrategy world” which consists of lot of sessions; it is also a good opportunity for individuals looking for more networking opportunities in MicroStrategy.
  • Performance and stability - MicroStrategy provides unmatched Performance and Stability on a Consolidated BI Platform
  • Social media and big data - MicroStrategy is one of the first vendors to certify integration with Amazon Redshift. MicroStrategy constantly demonstrates leadership and innovation in both major dimensions of Big Data analytics. They are one of the leading innovators in big data space. When we talk about BI, social media analytics cannot be neglected. Many market research firms rely on data from social media like Facebook, Twitter and LinkedIn to get some insights. MicroStrategy’s social intelligence platform includes a number of applications that help enterprises harness the power of social networks for marketing and e-commerce.
  • MicroStrategy Mobile - Michael Saylor, who is the founder of MicroStrategy, has also written a book called “The Mobile Wave: How Mobile Intelligence Will Change Everything” and I believe that its Saylor’s vision on the Mobile technology that has made MicroStrategy to grow on the Mobile Side. The Mobile App Platform transforms business intelligence reporting from paper or the desktop, to the mobile device.  Workers are no longer chained to their desk, or reliant on paper-based documents that are obsolete by the time they reach their target audience.  Mobile applications provide functionality that is unmatched by paper or a web browser. MicroStrategy has created some really amazing apps for IOS and Android users. They also have a program called Mobile Quick strike for 10 days where they would come up with a fully functional app. It gives customers the confidence in the overall strength of the tool. MicroStrategy’s mobile intelligence platform helps companies and organizations build, deploy, and maintain mobile apps across a range of solutions by embedding intelligence, transactions, and multimedia into apps.
  • Integration and certification with other platforms - MicroStrategy tool is certified with various reporting databases and platforms which makes it easily usable across the organizations. MicroStrategy team adds new certification and support with every release of the product.

Although every tool has some limitations, MicroStrategy is no exception. I will list down a few:

  • Sometimes it can be hard to quantify the ROI (return on investment) on software as the returns come indirectly through the better informed workers and decision makers across the company. The tool can be of limited use for some companies. MicroStrategy system still cannot be afforded by most of the companies. When they plan to buy this product they have various purchase options wherein the cost would be around some hundred dollars per user if they plan to go for a user based license or it can go as high as a million dollars for a CPU based license depending on the performance they want from their system. Although, in the past few years MicroStrategy has started modifying their services towards medium and small-sized industries, but the fact is that many of such firms do not consider them to be highly essential as its hard to quantify the ROI.This stands true for other paid tools as well and that is the reason open source tools are getting some market share these days
  • Having worked as a developer on MicroStrategy, I see Impact Analysis as a pain point area in MicroStrategy. MicroStrategy is an object-oriented tool and in this world of business, change is something which is inevitable, so in case there is a request for change and you need to find the impact of the object in MSTR, it does a recursive search and doesn’t give the list of objects in one view which makes very difficult for any developer to know the overall impact of the object inside a big project implementation. Unlike other BI tools which list down all the objects dependent on a particular object in a single view which can be exported into excel for tracking the change and validation of test cases.
  • Version control is not a strong point for MicroStrategy and I feel there is a scope of improvement in this area.
  • Development in MicroStrategy is done mostly using desktop though they have included lot of features with their latest release in web but I feel full support of web-based development is important and would improve with  the future release.
  • A pain point for some organizations these days is to find a good MicroStrategy resource. Getting a Cognos/BO/OBIEE expert from the IT market is comparatively easier than finding a good MicroStrategy resource.

These are some advantages and disadvantages of the tool. Compared to the advantages the disadvantages are some minor improvements which i feel would be taken care in future releases. I haven’t listed granular details as I wanted to keep this review at a high level for product understanding. There are lots of features which one can explore. When you do the contemplate MicroStrategy is a great product over all. To conclude you should never forget “Owning a supercomputer gives you a special feeling but the question is, Are you really using it for what it is intended to?”

Hope this review helps customers who are preparing their BI strategy. In case you need more insights, or you are planning for a BI implementation and have some questions do write to me on my e-mail. I would be more than happy to help.

Disclosure: The company I (Arpit) work for is partners with several vendors.

Add_arpit
The review is based on my professional experience and in case you feel that any information is inaccurate please do point it out so that i can rectify the same.

Filed under: Arpit Agrawal, Business Intelligence, Data Scientist, IT Central Station, MicroStrategy, MicroStrategy World

MicroStrategy Introduces Free Analytics Tool for the Desktop

$
0
0
MicroStrategy Analytics YouTube

MicroStrategy Analytics YouTube Video – Click Here

Last Tuesday, October 22, 2013, MicroStrategy revamped and expanded its line of BI software to incorporate big-data analytics and desktop visualization.

“We’re delivering a substantial new set of functionality,” said Kevin Spurway, MicroStrategy’s vice president of industry and mobile marketing.

The company has rebranded and upgraded its flagship BI application, now called the MicroStrategy Analytics Platform, and has introduced a new desktop application designed to allow business analysts to easily parse large data sets from different sources.

MicroStrategy Analytics Enterprise 9.4, a significant upgrade from MicroStrategy 9.3.1, includes a new capability the company calls data blending, which allows users to combine data from more than one source; the software stores the data in working memory without the need for a separate data integration product.

“Previously, we were able to combine data from different sources, but it required work from IT. Now any business user can grab data from different sources and bring them together with only a few clicks,” Spurway said.

Also new: The dashboard panel has been upgraded. It now can update data in real-time and can display multimedia files such as videos.

The new platform comes with a range of connectors for various types of big-data repositories. It can connect with the MongoDB NoSQL data store as well as Hadoop distributions from Hortonworks, Intel and Pivotal.

Analytics Enterprise now comes with the R statistical programming language, increasingly used for statistical analysis. Geographic Information System (GIS) software and service vendor Esri have provided a set of map skins and cartographic markers that can be used for geographic renditions of data sets.

MicroStrategy also has improved the performance of the software. The application can now fit 10 times as much data in memory as the previous version could, and the self-service querying now runs up to 40 percent faster.

In addition to updating its core enterprise software, MicroStrategy has also released a free tool to help business analysts fetch data from various sources and copy it directly to their desktops.

With the newly released MicroStrategy Analytics Desktop, users can grab data from relational databases, multidimensional databases, cloud-based applications and Hadoop deployments. Once on the desktop, the data can then be compiled into visualizations, such as basic pie charts, maps, graphs and various matrices.

You can click on the image below to download the software.

MicroStrategy Analytics Desktop Features

Click here to download free software


Filed under: Data Visualization, MicroStrategy

Data Archaeology Selected as One of the 2014 MicroStrategy World Dashboard Contest Winners

$
0
0

Click here to learn more about MicroStrategy World 2014Hello Readers:

Data Archaeology, Inc.I just found out I am one of the 2014 winners of the MicroStrategy World Dashboard Contest. I was also one of the winners last year.

I get a free pass to MicroStrategy World in Las Vegas which is the last week of this month. Last year, they gave us awards too. Not sure yet if they will do the same this year.

An Exploration of Tax Data

My dashboard is an exploration of tax data. It explores taxes rates for the top ten counties in terms of GDP.

I used horizontal stacked bar charts instead so that the viewer can visually see how social security and income tax rate add up to the total and explains visually why the countries are ordered the way they are on the dashboard. I also separated out $100K and $300K percentages into separate visuals.

In addition, I added the flags of the countries. Yes, I know, chart junk!

Now, you don’t see any numbers on the data points in this dashboard. The reason you don’t see them is because they appear when you mouse over a bar where you then see the country, category and the percent value as a tooltip.

Here is a screenshot of my entry. It was written with MicroStrategy v9.3.1, Report Services and the Visualization SDK.

Best regards,

Michael

Click on image to enlarge

DA_An_Exploration_of_Tax_Data


Filed under: Dashboard Design, Data Archaeology, Michael J. Saylor, MicroStrategy, MicroStrategy World

Has MicroStrategy Toppled Tableau as the Analytics King?

$
0
0
MicroStrategy Analytics

In a recent TDWI article titled Analysis: MicroStrategy’s Would-Be Analytics King, Stephen Swoyer, who is a technology writer based in Nashville, TN, stated that business intelligence (BI) stalwart MicroStrategy Inc. pulled off arguably the biggest coup at Teradata Corp.’s recent Partners User Group (Partners) conference, announcing a rebranded, reorganized, and — to some extent — revamped product line-up.

One particular announcement drew great interest: MicroStrategy’s free version of its discovery tool — Visual Insight — which it packages as part of a new standalone BI offering: MicroStrategy Analytics Desktop.

With Analytics Desktop, MicroStrategy takes dead aim at insurgent BI offerings from QlikTech Inc., Tibco Spotfire, and — most particularly — Tableau Software Inc.

MicroStrategy rebranded its products into three distinct groups: the MicroStrategy Analytics Platform (consisting of MicroStrategy Analytics Enterprise version 9.4 — an updated version of its v9.3.1 BI suite); MicroStrategy Express (its cloud platform available in both software- and platform-as-a-service  subscription options; and MicroStrategy Analytics Desktop (a single-user, BI discovery solution). MicroStrategy Analytics Enterprise takes a page from Tableau’s book via support for data blendinga technique that Tableau helped to popularize.

“We’re giving the business user the tools to join data in an ad hoc sort of environment, on the fly. That’s a big enhancement for us. The architectural work that we did to make that enhancement work resulted in some big performance improvements [in MicroStrategy Analytics Enterprise]: we improved our query performance for self-service analytics by 40 to 50 percent,” said Kevin Spurway, senior vice president of marketing with MicroStrategy.

Spurway — who, as an interesting aside, has a JD from Harvard Law School — said MicroStrategy implements data blending in much the same way that Tableau does: i.e., by doing it in-memory. Previous versions of MicroStrategy BI employed an interstitial in-memory layer, Spurway said; the performance improvements in MicroStrategy Analytics Enterprise result from shifting to an integrated in-memory design, he explained.

“It’s a function of just our in-memory [implementation]. Primarily it has to do with the way the architecture on our end works: we used to have kind of a middle in-memory layer that we’ve removed.”

Spurway described MicroStrategy Desktop Analytics as a kind of trump card: a standalone, desktop-oriented version of the MicroStrategy BI suite — anchored by its Visual Insight tool and designed to address the BI discovery use case. Desktop Analytics can extract data from any ODBC-compliant data source. Like Enterprise Analytics, it’s powered by an integrated in-memory engine.

In other words: a Tableau-killer.

“That [Visual Insight] product has been out there but has always been kind of locked up in our Enterprise product,” he said, acknowledging that MicroStrategy offered Visual Insight as part of its cloud stack, too. “You had to be a MicroStrategy customer who obviously has implemented the enterprise solution, or you could get it through Express, [which is] great for some people, but not everybody wants a cloud-based solution. With [MicroStrategy Desktop Analytics], you go to our website, download and install it, and you’re off and running — and we’ve made it completely free.”

The company’s strategy is that many users will, as Spurway put it, “need more.” He breaks the broader BI market into two distinct segments — with a distinct, Venn-diagram-like area of overlap.

“There’s a visual analytics market. It’s a hot market, which is primarily being driven by business-user demand. Then there’s the traditional business intelligence market, and that market has been there for 20 years. It’s not growing as quickly, and there’s some overlap between the two,” he explained.

“The BI market is IT-driven. For business users, they need speed, they need better ways to analyze their data than Excel provides; they don’t want impediments, they need quick time to value. The IT organization cares about … things … [such as] traditional reporting [and] information-driven applications. Those are apps that are traditionally delivered at large scale and they have to rely on data that’s trusted, that’s modeled.”

If or when users “need more,” they can “step up” to MicroStrategy’s on-premises (Enterprise Analytics) or cloud (Express) offerings, Spurway pointed out. “The IT organization has to support the business users, but they also need to support the operationalization of analytics,” he argued, citing the goal of embedding analytics into the business process. “That can mean a variety of things. It can mean a very simple report or dashboard that’s being delivered every day to a store manager in a Starbucks. They’re not going to need Visual Insight for something like that — they’re not going to need Tableau. They need something that’s simplified for everyday usage.”

MicroStrategy Analytics Powerful

Something More, Something Else

Many in the industry view self-service visual discovery as the culmination of traditional BI.

One popular narrative holds that QlikTech, Tableau, and Spotfire helped establish and popularize visual discovery as an (insurgent) alternative to traditional BI. Spurway sought to turn this view on its head, however: Visual discovery, he claimed, “is a starting point. It draws you in. The key thing that we bring to the table is the capability to bridge the gap between traditional model, single-version-of-the-truth business intelligence and fast, easy, self-service business analytics.”

In Spurway’s view, the usefulness or efficacy of BI technologies shouldn’t be plotted on a linear time-line, e.g., anchored by greenbar reports on the extreme left and culminating in visual discovery on the far right. Visual discovery doesn’t complete or supplant traditional BI, he argued, and it isn’t inconceivable that QlikTech, Tableau, and Spotfire — much like MicroStrategy and all of the other traditional BI powers that now offer visual discovery tools as part of their BI suite — might augment their products with BI-like accoutrements.

Instead of a culmination, Spurway sees a circle — or, better still, a möbius strip: regardless of where you begin with BI, at some point — in a large enough organization — you’re going to traverse the circle or (as with a möbius strip) come out the other side.

There might be something to this. From the perspective of the typical Tableau enthusiast, for example, the expo floor at last year’s Tableau Customer Conference (TCC), held just outside of Washington, D.C. in early September, probably offered a mix of the familiar, the new, and the plumb off-putting. For example, Tableau users tend to take a dim view of traditional BI, to say nothing of the data integration (DI) or middleware plumbing that’s associated with it: “Just let me work already!” is the familiar cry of the Tableau devotee. However, TCC 2013 played host to several old-guard exhibitors — including IBM Corp., Informatica Corp., SyncSort Inc., and Teradata Corp. — as well as upstart players such as WhereScape Inc. and REST connectivity specialist SnapLogic Inc.

These vendors weren’t just exhibiting, either. As a case in point, Informatica and Tableau teamed up at TCC 2013 to trumpet a new “strategic collaboration.” As part of this accord, Informatica promised to certify its PowerCenter Data Virtualization Edition and Informatica Data Services products for use with Tableau. In an on-site interview, Ash Parikh, senior director of emerging technologies with Informatica, anticipated MicroStrategy’s Spurway by arguing that organizations “need something more.” MicroStrategy’s “something more” is traditional BI reporting and analysis; Informatica’s and Tableau’s is visual analytic discovery.

“Traditional business intelligence alone does not cut it. You need something more. The business user is demanding faster access to information that he wants, but [this] information needs to be trustworthy,” Parikh argued. “This doesn’t mean people who have been doing traditional business intelligence have been doing something wrong; it’s just that they have to complement their existing approaches to business intelligence,” he continued, stressing that Tableau needs to complement — and, to some extent, accommodate — enterprise BI, too.

“From a Tableau customer perspective, Tableau is a leader in self-service business intelligence, but Tableau [the company] is very aware of the fact that if they want to become the standard within an enterprise, the reporting standard, they need to be a trusted source of information,” he said.

Among vendor exhibitors at TCC 2013, this term — “trusted information” or some variation — was a surprisingly common refrain. If Tableau wants to be taken seriously as an enterprisewide player, said Rich Dill, a solutions engineer with SnapLogic, it must be able to accommodate the diversity of enterprise applications, services, and information resources. More to the point, Dill maintained, it must do so in a way that comports with corporate governance and regulatory strictures.

“[Tableau is] starting to get into industries where audit trails are an issue. I’ve seen a lot of financial services and healthcare and insurance businesses here [i.e., at TCC] that have to comply with audit trails, auditability, and logging,” he said. In this context, Dill argued, “If you can’t justify in your document where that number came from, why should I believe it? The data you’re making these decisions on came from these sources, but are these sources trusted?”

Mark Budzinski, vice president and general manager with WhereScape, offered a similar — and, to be sure, similarly self-serving — assessment. Tableau, he argued, has “grown their business by appealing to the frustrated business user who’s hungry for data and analytics anyway they can get it,” he said, citing Tableau’s pioneering use of data blending, which he said “isn’t workable [as a basis for decision-making] across the enterprise. You’re blending data from all of these sources, and before you know it, the problem that the data’s not managed in the proper place starts to rear its ugly head.”

Budzinski’s and WhereScape’s pitch — like those of IBM and Teradata — had a traditional DM angle. “There’s no notion of historical data in these blends and there’s no consistency: you’re embedding business rules at the desktop, [but] who’s to say that this rule is the same as the [rule used by the] guy in the next unit. How do you ensure integrity of the data and [ensure that] the right decisions were made? The only way to do that is in some data warehouse-, data mart-[like] thing.”

Stephen Swoyer can be reached at stephen.swoyer@spinkle.net.


Filed under: MicroStrategy, Tableau, TDWI

MicroStrategy to focus on customers, not ‘PowerPoint slides,’ at MicroStrategy World conference

$
0
0
Source: Chris Kanaracus, IDG News Service, PCWorld, Business & Finance Software

Paul Zolfaghari
President, MicroStrategy

While some vendor conferences can end up mired in technical minutiae, MicroStrategy believes it’s better to show, not tell customers how its BI (business intelligence) software works, according to its president, Paul Zolfaghari.

”More than 50 MicroStrategy customers will deliver presentations at the event, which has about 130 sessions planned in total, according to a statement. They include BMC Software, Flextronics, Nielsen, Panda Restaurant Group and Publicis Touchpoint Solutions.

Scheduled for keynotes are Facebook CIO Tim Campos and Gucci CIO Simone Pacciarini, who will discuss their use of Microstrategy technology.When it does discuss products at the event, Microstrategy plans to showcase its recently released Analytics Desktop, a self-service BI tool that is available at no charge, as well as its push into mobile BI, Zolfaghari said.

Mobility has transformed the BI market, in Zolfaghari’s view. Five or six years ago, companies largely ran some internal reports and rolled the results up the corporate food chain, he said. “What’s happened is BI has now moved massively outside of HQ.”

It’s also likely MicroStrategy will discuss the massively parallel in-memory computing architecture it’s been working on with Facebook. The technology should be commercially available from MicroStrategy later this year, showing up first in MicroStrategy’s cloud BI offering, according to Zolfaghari.

The conference comes as MicroStrategy, the industry’s last remaining large pure BI vendor, faces ever-stiffer competition from platform companies such as Oracle and SAP, as well as upstarts like Tableau and Birst.

But MicroStrategy is keeping an edge thanks to a number of key strategic decisions, according to a recently released Forrester Research report on the BI market.

”MicroStrategy has grown organically and architected its entire suite as a single platform,” analyst Boris Evelson wrote. “Forrester clients find that, after making the initial investment and effort in MicroStrategy, the reusability of all objects and the relational OLAP engine with drill-anywhere capability often result in a lower long-term total cost of ownership.”

Forrester clients are also having success rolling out mobile BI based on MicroStrategy’s platform, Evelson said.

But there’s some cause for concern over MicroStrategy’s “high reliance on a largely disappearing network of partners, many of which have been acquired,” for architectural components such as ETL (extract, transform and load), data quality and MDM (master data management), Evelson added.

Zolfaghari downplayed the impact of its partners being acquired, noting that Informatica, a major provider of such tools, remains independent. MicroStrategy also maintains “robust relationships” with companies such as IBM, SAP and Oracle, he said.

MicroStrategy World runs from Jan. 27-30.

Chris Kanaracus covers enterprise software and general technology breaking news for the IDG News Service. More by


Filed under: Michael J. Saylor, MicroStrategy, MicroStrategy World, Paul Zolfaghari, PCWorld

PRIME: MicroStrategy Announces Release of Cloud Based, In-Memory Analytics Service, Running at Multi-Terabyte Scale

$
0
0

MicroStrategy Cloud’s New Parallel Relational In-Memory Engine (PRIME) Provides High Performance On Big Data Allowing Companies to Build High-Scale, Easy-to-Use Information Driven Apps

Las Vegas, NV, January 28, 2014 – MicroStrategy® Incorporated (Nasdaq: MSTR), a leading worldwide provider of enterprise software platforms, today announced the availability of its new Parallel Relational In-Memory Engine (PRIME) option for the MicroStrategy Cloud™ at its annual user conference, MicroStrategy World 2014, in Las Vegas. MicroStrategy PRIME™ is a massively scalable, cloud-based, in-memory analytics service designed to deliver extremely high performance for complex analytical applications that have the largest data sets and highest user concurrency. Facebook has successfully built high value information-driven applications with the technology that powers MicroStrategy PRIME.

“Rising data volumes are fueling demand for compelling, easy-to-use analytical applications with the power to revolutionize existing business processes for thousands or tens of thousands of employees, customers, or partners,” said Michael Saylor, CEO, MicroStrategy Incorporated. “MicroStrategy PRIME has been built from the ground up to support the engineering challenges associated with development of these powerful new information-driven apps. This innovative service will allow organizations to derive maximum value from their information by making their Big Data assets actionable.”

Most organizations struggle to harness the value of the information in their Big Data stores due to poor performance. Big Data technologies can store large amounts of information, but distributing that information in an interactive manner to thousands of users with existing commercially available technologies is a huge challenge, often resulting in risky, multi-year projects. MicroStrategy PRIME breaks new ground by tightly coupling a state-of-the art visualization and dashboarding engine with an innovative massively parallel in-memory data store. This architecture allows companies to build highly interactive applications that deliver responses to hundreds of thousands of users in a fraction of the time and cost of other approaches. MicroStrategy PRIME acts as a performance accelerator, opening up the data in databases to a much larger user population, driving new demand for information.

MicroStrategy PRIME combines:

  • Massively parallel, distributed, in-memory architecture for extreme scale. MicroStrategy PRIME is built on an in-memory, highly distributed, massively parallel architecture, designed to run on cost effective commodity hardware. Complex analytics problems can be partitioned across hundreds of CPU cores and nodes to achieve unprecedented performance. MicroStrategy has worked closely with leading hardware vendors to take full advantage of today’s multi-core, high memory servers.
  • Tightly integrated dashboard engine for beautiful, easy-to-use applications. MicroStrategy PRIME includes a state-of-the-art dashboard and data exploration engine, built on the MicroStrategy Analytics Platform™. The visualization engine includes hundreds of optimizations designed specifically for the in-memory data store. This engine enables customers to build complete, immersive applications that deliver high-speed response.
  • Cloud-based delivery for rapid deployment. MicroStrategy PRIME is available as a service on MicroStrategy Cloud, MicroStrategy’s world-class Cloud Analytics platform. MicroStrategy Cloud offers a complete service, including the infrastructure, people and processes to enable customers to quickly and easily develop and deploy high-scale, information-driven applications.

About MicroStrategy Incorporated

Founded in 1989, MicroStrategy (Nasdaq: MSTR) is a leading worldwide provider of enterprise software platforms. The Company’s mission is to provide the most flexible, powerful, scalable and user-friendly platforms for analytics, mobile, identity and loyalty, offered either on premises or in the cloud.

The MicroStrategy Analytics Platform™ enables leading organizations to analyze vast amounts of data and distribute actionable business insight throughout the enterprise. Our analytics platform delivers reports and dashboards, and enables users to conduct ad hoc analysis and share their insights anywhere, anytime. MicroStrategy Mobile™ lets organizations rapidly build information-rich applications that combine multimedia, transactions, analytics, and custom workflows. The MicroStrategy Identity Platform™ (branded as MicroStrategy Usher™) provides organizations the ability to develop a secure mobile app for identity and credentials. The MicroStrategy Loyalty Platform™ (branded as MicroStrategy Alert) is a next-generation, mobile customer loyalty and engagement solution. To learn more about MicroStrategy, visit www.microstrategy.com and follow us on Facebook and Twitter.

MicroStrategy, MicroStrategy Analytics Platform, MicroStrategy Mobile, MicroStrategy Identity Platform, MicroStrategy Loyalty Platform, MicroStrategy Usher, MicroStrategy Cloud and MicroStrategy PRIME are either trademarks or registered trademarks of MicroStrategy Incorporated in the United States and certain other countries. Other product and company names mentioned herein may be the trademarks of their respective owners.


Filed under: Analytics, Cloud, In-Memory, MicroStrategy, MicroStrategy World

Michael Saylor’s MicroStrategy World 2014 Keynote Presentation

An Introduction to Data Blending – Part 1 (Introduction, Visual Analysis Life-cycle)

$
0
0

Readers:

Today I am beginning a multi-part series on data blending.

  • Parts 1, 2 and 3 will be an introduction and overview of what data blending is.
  • Part 4 will review an illustrative example of how to do data blending in Tableau.
  • Part 5 will review an illustrative example of how to do data blending in MicroStrategy.

I may also include a Part 6, but I have to see how my research on this topic continues to progress over the next week.

Much of Parts 1, 2 and 3 are based on a research paper written by Kristi Morton from The University of Washington (and others) [1].

Please review the source references, at the end of each blog post in this series, to be directed to the source material for additional information.

I hope you find this series helpful for your data visualization needs.

Best Regards,

Michael

Introduction

Tableau and MicroStrategy’s new Analytics Platform are commercial business intelligence (BI) software tools that support interactive, visual analysis of data. [1] 

Using a Web-based visual interface to data and a focus on usability, these tools enable a wide audience of business partners (IT’s end-users) to gain insight into their datasets. The user experience is a fluid process of interaction in which exploring and visualizing data takes just a few simple drag-and-drop operations (no programming skills or DB experience is required). In this context of exploratory, ad-hoc visual analysis, we will explore a feature originally introduced in Tableau v6.0, and in MicroStrategy’s new Analytics Platform v9.4.1 late last year (2013).

We will examine how we can integrate large, heterogeneous data sources. This feature is called data blending, which gives users the ability to create data visualization mashups from structured, heterogeneous data sources dynamically without any upfront integration effort. Users can author visualizations that automatically integrate data from a variety of sources, including data warehouses, data marts, text files, spreadsheets, and data cubes. Because data blending is workload driven, we are able to bypass many of the pain points and uncertainty in creating mediated schemas and schema-mappings in current pay-as-you-go integration systems.

The Cycle of Visual Analysis

Unlike databases, our human brains have limited capacity for managing and making sense of large collections of data. In database terms, the feat of gaining insight in big data is often accomplished by issuing aggregation and filter queries (producing subsets of data).

However, this approach can be time-consuming. The user is forced to complete the following tasks.

  1. Figure out what queries to write.
  2. Write the queries.
  3. Wait for the results to be returned back in textual format. And, then finally,
  4. Read through these textual summaries (often containing thousands of rows) to search for interesting patterns or anomalies.

Tools like Tableau and MicroStrategy help bridge this gap by providing a visual interface to the data. This approach removes the burden of having to write queries. The user can ask their questions through visual drag-and-drop operations (again, no queries or programming experience required). Additionally, answers are displayed visually, where patterns and outliers can quickly be identified.

Visualizations leverage the powerful human visual system to help us effectively digest large amounts of information and disseminate it quicker.

Cycle of Visual Analysis

Image: Kristi Morton, Ross Bunker, Jock Mackinlay, Robert Morton, and Chris Stolte, Dynamic Workload Driven Data Integration in Tableau. [1]

Figure 1, above, illustrates how visualization is a key component in turning information into knowledge and knowledge into wisdom.

Ms. Morton discusses the process as follows,

The process starts with some task or question that a knowledge worker (shown at the center) seeks to gain understanding. In the first stage, the user forages for data that may contain relevant information for their analysis task. Next, they search for a visual structure that is appropriate for the data and instantiate that structure. At this point, the user interacts with the resulting visualization (e.g. drill down to details or roll up to summarize) to develop further insight.

Once the necessary insight is obtained, the user can then make an informed decision and take action. This cycle is centered around and driven by the user and requires that the visualization system be flexible enough to support user feedback and allow alternative paths based on the needs of the user’s exploratory tasks. Most visualization tools, however, treat this cycle as a single, directed pipeline, and offer limited interaction with the user. Moreover, users often want to ask their analytical questions over multiple data sources. However, the task of setting up data for integration is orthogonal to the analysis task at hand, requiring a context switch that interrupts the natural flow of the analysis cycle. We extend the visual analysis cycle with a new feature called data blending that allows the user to seamlessly combine and visualize data from multiple different data sources on-the-fly. Our blending system issues live queries to each data source to extract the minimum information necessary to accomplish the visual analysis task.

Often, the visual level of detail is at a coarser level than the data sets. Aggregation queries, therefore, are issued to each data source before the results are copied over and joined in Tableau’s local in-memory view. We refer to this type of join as a post-aggregate join and find it a natural fit for exploratory analysis, as less data is moved from the sources for each analytical task, resulting in a more responsive system.

Finally, Tableau’s data blending feature automatically infers how to integrate the datasets on-the-fly, involving the user only in resolving conflicts. This system also addresses a few other key data integration challenges, including combining datasets with mismatched domains or different levels of detail and dirty or missing data values. One interesting property of blending data in the context of a visualization is that the user can immediately observe any anomalies or problems through the resulting visualization.

These aforementioned design decisions were grounded in the needs of Tableau’s typical BI user base. Thanks to the availability of a wide-variety of rich public datasets from sites like data.gov, many of Tableau’s users integrate data from external sources such as the Web or corporate data such as internally-curated Excel spreadsheets into their enterprise data warehouses to do predictive, what-if analysis.

However, the task of integrating external data sources into their enterprise systems is complicated. First, such repositories are under strict management by IT departments, and often IT does not have the bandwidth to incorporate and maintain each additional data source. Second, users often have restricted permissions and cannot add external data sources themselves. Such users cannot integrate their external and enterprise sources without having them collocated.

An alternative approach is to move the data sets to a data repository that the user has access to, but moving large data is expensive and often untenable. We therefore architected data blending with the following principles in mind: 1) move as little data as possible, 2) push the computations to the data, and 3) automate the integration challenges as much as possible, involving the user only in resolving conflicts.

Next: Data Blending Overview

——————————————————————————————————–

References:

[1] Kristi Morton, Ross Bunker, Jock Mackinlay, Robert Morton, and Chris Stolte, Dynamic Workload Driven Data Integration in Tableau, University of Washington and Tableau Software, Seattle, Washington, March 2012, http://homes.cs.washington.edu/~kmorton/modi221-mortonA.pdf.


Filed under: Data Blending, Kristi Morton, MicroStrategy, Tableau

An Introduction to Data Blending – Part 2 (Hans Rosling, Gapminder and Data Blending)

$
0
0

Readers:

In Part 1 of this series on data blending, we began to explore the concepts of data blending as well as the life-cycle of visual analysis.

Today, in Part 2 of this series, we will dig deeper into how data blending works.

Again, much of Parts 1, 2 and 3 are based on a research paper written by Kristi Morton from The University of Washington (and others) [1].

You can learn more about Ms. Morton’s research as well as other resources used to create this blog post by referring to the References at the end of the blog post.

Best Regards,

Michael

Data Blending Overview

Data Blending allows an end-user to dynamically combine and visualize data from multiple heterogeneous sources without any upfront integration effort. [1] A user authors a visualization starting with a single data source – known as the primary – which establishes the context for subsequent blending operations in that visualization. Data blending begins when the user drags in fields from a different data source, known as a secondary data source. Blending happens automatically, and only requires user intervention to resolve conflicts. Thus the user can continue modifying the visualization, including bringing in additional secondary data sources, drilling down to finer-grained details, etc., without disrupting their analytical flow. The novelty of this approach is that the entire architecture supporting the task of integration is created at runtime and adapts to the evolving queries in typical analytical workflows.

A Simple Illustrative Example

In this section we will discuss a scenario in which three unique data sources (see left half of Figure 1 below for sample tables) are blended together to create the visualization shown in Figure 2 below. This is a simple, yet compelling mashup of three unique measures that tells an interesting story about the complexities of global infant mortality rates in the year 2000.

Figure 1

 

Image: Kristi Morton, Ross Bunker, Jock Mackinlay, Robert Morton, and Chris Stolte, Dynamic Workload Driven Data Integration in Tableau. [1]

In this example, the user wants to understand if there is a connection between infant mortality rates, GDP, and population. She has three distinct spreadsheets with the following characteristics: the first data source contains information about the infant mortality rates per 1000 live births for each country, the second contains information about each country’s total population, and the third source contains country-level GDP. For this analysis task, the user drags the fields, “Country or Area” and “Infant mortality rate per 1000 live births”, from her first data source onto the blank visual canvas. Since these fields were the first ones selected by the user, then the data source associated with these fields becomes the primary data source.

This action produces a visualization showing the relative infant mortality rates for each country. But the user wants to understand if there is a correlation between GDP and infant mortality, so she then drags the “GDP per capita in US dollars” field onto the current visual canvas from Data Table A. The step to join the GDP measure from this separate data source happens automatically: the blending system detects the common join key (ı.e. “Country or Area”) and combines the GDP data with the infant mortality data for each country. Finally, to complete her analysis task, she adds the “Population” measure from Data Table B, to the visual canvas, which produces the visualization in Figure 2 below associated with the blended data table in Figure 1.

 

Figure 2

Image: Kristi Morton, Ross Bunker, Jock Mackinlay, Robert Morton, and Chris Stolte, Dynamic Workload Driven Data Integration in Tableau. [1] 

Hans Rosling, Gapminder and Data Blending

The Gapminder World interactive graph below shows how long people live and how the number of children a woman has is affected by how much money they earn using different data sources.

Gapminder World for Windows

Image: Hans Rosling’s Wealth and Health of Nations (Gapminder.org) [2]

Hans RoslingIn the screenshot above, the y-axis shows us Children per women (total fertility) . The x-axis shows us Income per person (GDP/capita, PPP$ inflation-adjusted). The series data points (the bubbles) show us population for each country. If you were to click the Play button, you would see as an interactive “slide show” how countries have developed since 1800.

This demonstrates the flexibility of the data blending feature, namely that users can dynamically change their blended views by pivoting on different data sources and measures to blend in their visualizations.

In the screenshot below, Mr. Rosling explains how to use the interactive Gapminder World application.

Also, Mr. Rosling has provided Gapminder World Offline, which you can use to show animated statistics from your own laptop! It can be run on Windows, Mac and Linux. Here is a link to the download installation page on the Gapminder.org site.

And here is a link to the PDF for the Gapminder World Guide show above.

Gapminder World Guide

Image: Hans Rosling’s Gapminder World Guide (PDF) [2]

Next: Usage Scenarios and Design Principles

——————————————————————————————————–

References:

[1] Kristi Morton, Ross Bunker, Jock Mackinlay, Robert Morton, and Chris Stolte, Dynamic Workload Driven Data Integration in Tableau, University of Washington and Tableau Software, Seattle, Washington, March 2012, http://homes.cs.washington.edu/~kmorton/modi221-mortonA.pdf.

[2] Hans Rosling, Wealth & Health of Nations, Gapminder.org, http://www.gapminder.org/world/.

 


Filed under: Data Blending, Gapminder, Hans Rosling, Kristi Morton, MicroStrategy, Tableau

An Introduction to Data Blending – Part 3 (Benefits of Blending Data)

$
0
0

Readers:

In Part 2 of this series on data blending, we delved deeper into understanding what data blending is. We also examined how data blending is used in Hans Rosling’s well-known Gapminder application.

Today, in Part 3 of this series, we will dig even deeper by examining the benefits of blending data.

Again, much of Parts 1, 2 and 3 are based on a research paper written by Kristi Morton from The University of Washington (and others) [1].

You can learn more about Ms. Morton’s research as well as other resources used to create this blog post by referring to the References at the end of the blog post.

Best Regards,

Michael

Benefits of Blending Data

In this section, we will examine the advantages of using the data blending feature for integrating datasets. Additionally, we will review another illustrative example of data blending using Tableau.

Integrating Data Using Tableau

In Ms. Morton’s research, Tableau was equipped with two ways of integrating data. First, in the case where the data sets are collocated (or can be collocated), Tableau formulates a query that joins them to produce a visualization. However, in the case where the data sets are not collocated (or cannot be collocated), Tableau federates queries to each data source, and creates a dynamic, blended view that consists of the joined result sets of the queries. For the purpose of exploratory visual analytics, Ms. Morton (et al) found that data blending is a complementary technology to the standard collocated approach with the following benefits:

  • Resolves many data granularity problems
  • Resolves collocation problems
  • Adapts to needs of exploratory visual analytics

Figure 1 - Company Tables

Image: Kristi Morton, Ross Bunker, Jock Mackinlay, Robert Morton, and Chris Stolte, Dynamic Workload Driven Data Integration in Tableau. [1]

Resolving Data Granularity Problems

Often times a user wants to combine data that may not be at the same granularity (i.e. they have different primary keys). For example, let’s say that an employee at company A wants to compare the yearly growth of sales to a competitor company B. The dataset for company B (see Figure 1 above) contains a detailed quarterly growth of sales for B (quarter, year is the primary key), while company A’s dataset only includes the yearly sales (year is the primary key). If the employee simply joins these two datasets on yearly earnings, then each row from A will be duplicated for each quarter in B for a given year resulting in an inaccurate overestimate of A’s yearly earnings.

This duplication problem can be avoided if for example, company B’s sales dataset were first aggregated to the level of year, then joined with company A’s dataset. In this case, data blending detects that the data sets are at different granularities by examining their primary keys and notes that in order to join them, the common field is year. In order to join them on year, an aggregation query is issued to company B’s dataset, which returns the sales aggregated up to the yearly level as shown in Figure 1. This result is blended with company A’s dataset to produce the desired visualization of yearly sales for companies A and B.

The blending feature does all of this on-the-fly without user-intervention.

Resolves Collocation Problems

As mentioned in Part 1, managed repository is expensive and untenable. In other cases, the data repository may have rigid structure, as with cubes, to ensure performance, support security or protect data quality. Furthermore, it is often unclear if it is worth the effort of integrating an external data set that has uncertain value. The user may not know until she has started exploring the data if it has enough value to justify spending the time to integrate and load it into her repository.

Thus, one of the paramount benefits of data blending is that it allows the user to quickly start exploring their data, and as they explore the integration happens automatically as a natural part of the analysis cycle.

An interesting final benefit of the blending approach is that it enables users to seamlessly integrate across different types of data (which usually exist in separate repositories) such as relational, cubes, text files, spreadsheets, etc.

Adapts to Needs of Exploratory Visual Analytics

A key benefit of data blending is its flexibility; it gives the user the freedom to view their blended data at different granularities and control how data is integrated on-the-fly. The blended views are dynamically created as the user is visually exploring the datasets. For example, the user can drill-down, roll-up, pivot, or filter any blended view as needed during her exploratory analysis. This feature is useful for data exploration and what-if analysis.

Another Illustrative Example of Data Blending

Figure 2 (below) illustrates the possible outcomes of an election for District 2 Supervisor of San Francisco. With this type of visualization, the user can select different election styles and see how their choice affects the outcome of the election.

What’s interesting from a blending standpoint is that this is an example of a many-to-one relationship between the primary and secondary datasets. This means that the fields being left-joined in by the secondary data sources match multiple rows from the primary dataset and results in these values being duplicated. Thus any subsequent aggregation operations would reflect this duplicate data, resulting in overestimates. The blending feature, however, prevents this scenario from occurring by performing all aggregation prior to duplicating data during the left-join.

Figure 2 - San Francisco Election

Image: Kristi Morton, Ross Bunker, Jock Mackinlay, Robert Morton, and Chris Stolte, Dynamic Workload Driven Data Integration in Tableau. [1]

Next: Data Blending Design Principles

——————————————————————————————————–

References:

[1] Kristi Morton, Ross Bunker, Jock Mackinlay, Robert Morton, and Chris Stolte, Dynamic Workload Driven Data Integration in Tableau, University of Washington and Tableau Software, Seattle, Washington, March 2012, http://homes.cs.washington.edu/~kmorton/modi221-mortonA.pdf.

[2] Hans Rosling, Wealth & Health of Nations, Gapminder.org, http://www.gapminder.org/world/.


Filed under: Data Blending, Kristi Morton, MicroStrategy, Tableau

An Introduction to Data Blending – Part 4 (Data Blending Design Principles)

$
0
0

Readers:

In Part 3 of this series on data blending, we  examining the benefits of blending data. We also reviewed an example of data blending that illustrated the possible outcomes of an election for the District 2 Supervisor of San Francisco.

Today, in Part 4 of this series, we will discuss data blending design principles and show another illustrative example of data blending using Tableau.

Again, much of Parts 1, 2, 3 and 4 are based on a research paper written by Kristi Morton from The University of Washington (and others) [1].

You can learn more about Ms. Morton’s research as well as other resources used to create this blog post by referring to the References at the end of the blog post.

Best Regards,

Michael

Data Blending Design Principles

In Part 3, we describe the primary design principles upon which Tableau’s data blending feature was based. These principles were influenced by the application needs of Tableau’s end-user. In particular, we designed the blending system to be able to integrate datasets on-the-fly, be responsive to change, and driven by the visualization. Additionally, we assumed that the user may not know exactly what she is looking for initially, and needs a flexible, interactive system that can handle exploratory visual analysis.

Push Computation to Data and Minimize Data Movement

Tableau’s approach to data visualization allows users to leverage the power of a fast database system. Tableau’s VizQL algebra is a declarative language for succinctly describing visual representations of data and analytics operations on the data. Tableau compiles the VizQL declarative formalism representing a visual specification into SQL or MDX and pushes this computation close to the data, where the fast database system handles computationally intensive aggregation and filtering operations. In response, the database provides a relatively small result set for Tableau to render. This is an important factor in Tableau’s choice of post-aggregate data integration across disparate data sources – since the integrated result sets must represent a cognitively manageable amount of information, the data integration process operates on small amounts of aggregated, filtered data from each data source. This approach avoids the costly migration effort to collocate massive data sets in a single warehouse, and continues to leverage fast databases for performing expensive queries close to the data.

Automate as Much as Possible, but Keep User in Loop

Tableau’s primary focus has been on ease of use since most of Tableau’s end-users are not database experts, but range from a variety of domains and disciplines: business analysts, journalists, scientists, students, etc. This lead them to take a simple, pay-as-you-go integration approach in which the user invests minimal upfront effort or time to receive the benefits of the system. For example, the data blending system does not require the user to specify schemas for their data sets, rather the system tries to infer this information as well as how to apply schema matching techniques to blend them for a given visualization. Furthermore, the system provides a simple drag-and-drop interface for the user to specify the fields for a visualization, and if there are fields from multiple data sources in play at the  same time, the blending system infers how to join them to satisfy the needs of the visualization.

In the case that something goes wrong, for example, if the schema matching could not succeed, the blending system provides a simple interface for specifying data source relationships and how blending should proceed. Additionally, the system provides several techniques for managing the impact of dirty data on blending, which we discuss in more in Part 5 of this series.

Another Example: Patient Falls Dashboard [3]

NOTE: The following example is from Jonathan Drummey via the Drawing with Numbers blog site. The example uses Tableau v7, but at the end of the instructions on how he creates this dashboard in Tableau v7, Mr. Drummey includes instructions how the steps became more simplied in Tableau v8. I have included a reference to this blog post on his site in the reference section of my blog entry. The “I”, “me” voice you read in this example is that of Mr. Drummey.

As part of improving patient safety, we track all patient falls in our healthcare system, and the number of patient days – the total of the number of days of inpatient stays at the hospital. Every month report we report to the state our “fall rate,” a metric of the number of falls with injury for certain units in the hospital per 1000 patient days, i.e. days that patients are at the hospital. Our annualized target is to have less than 0.7 falls with injury per 1000 patient days.

A goal for our internal dashboard is to show the last 13 months of fall rates as a line chart, with the most recent fall events as a bar chart, in a combined chart, along with a separate text table showing some details of each fall event. Here’s the desired chart, with mocked-up data:

 

combo bars and lines

On the surface, blending this data seems really straightforward. We generate a falls rate very month for every reporting unit, so use that as the primary, then blend in the falls as they happen. However, this has the following issues:

  • Sparse Data – As I’m writing this, it’s March 7th. We usually don’t get the denominator of the patient days for the prior month (February) for a few more days yet, so there won’t be any February row of measure data to use as the primary to get the February fall events to show on the dashboard. In addition, there still wouldn’t be any March data to get the March fall events. Sometimes when working with blend, the solution is to flip our choices for the primary and secondary datasource. However, that doesn’t work either because a unit might go for months or years without a patient fall, so there wouldn’t be any fall events to blend in the measure data.
  • Falls With and Without Injury – In the bar chart, we don’t just want to show the number of patient falls, we want to break down the falls by whether or not they were falls with injury – the numerator for the fall rate metric – and all other falls. The goal of displaying that data is to help the user keep in mind that as important as it is to reduce the number of falls with injury, we also need to keep the overall number of falls down as well. No fall = no chance of fall with injury.
  • Unit Level of Detail – Because the blend needs to work at the per-unit level of detail as well as across all reporting units, that means (in version 7 at least) that the Unit needs to be in the view for the blend to work. But we want to display a single falls rate no matter how many units are selected.

Sparse Data

To deal with issue of sparse data, there are a few possible solutions:

  • Change the combined line and bar chart into separate charts. This would perhaps be the easiest, though it would require some messing about with filters, hidden reference lines, and continuous date axes to ensure that the two charts had similar axis ranges no matter what. However, that would miss out on the key capability of the combined chart to directly see how a fall contributes to the fall rate. In addition, there would be no reason to write this blog post. :)
  • Perform padding in the data source, either via a query/view or Custom SQL. In an earlier version of this project I’d built this, and maintaining a bunch of queries with Cartesian joins isn’t my favorite cup of tea.
  • Building a scaffold data source with all combinations of the month and unit and using the scaffold as the primary data source. While possible, this introduces maintenance issues when there’s a need for additional fields at a finer level of detail. For example, the falls measure actually has three separate fall rates – monthly, quarterly, and annual. These are generated as separate rows in our measures data and the particular duration is indicated by the Period field. So the scaffold source would have to include the Period field to get the data, but then that could be too much detail for the blended fall event data, and make for more complexity in the calculations to make sure the aggregations worked properly.
  • Do a tiny bit of padding in the query, then do the rest in Tableau via Show Missing Values aka domain padding. As I’d noted in an earlier post on blending, domain padding occurs before data is blended so we can pad out the measure data through the current date and then include all the falls. This is the technique I chose, for the reason that padding one row to the data is trivial and turning on Show Missing Values is a couple of mouse clicks. Here’s how I did that:

In my case, the primary data source is a Microsoft Access query that gets the falls measure results from a table that also holds results for hundreds of other metrics that we track. I created a second query with the same number of columns that returns Null for every field except the Measure Date, which has a value of 1/1/1900. Then a third query UNION’s those two queries together, and that’s what is used as the data source in Tableau.

Then, in Tableau, I added a calculated field called Date with the following formula:

//used for padding out display to today
IF [Measure Date] == #1/1/1900# THEN 
    TODAY() 
ELSE 
    [Measure Date] 
END

The measure results data contains a row per measure, reporting unit, and the period. These are pre-calculated because the data is used in a variety of different outputs. Since in this dashboard we are combining the results across units, we can’t just use the rate, we need to go back to the original numerator and denominator. So, I also created a new field for the Calculated Rate:

SUM([Numerator])/SUM([Denominator])

Now it’s possible to start building the line chart view:

  1. Put the Month(Date) – the full month/year version as a discrete – on Columns, Calculated Rate on Rows, Period on the Color Shelf. This only shows the data that exists in the data source, including the empty value for the current month (March in this case):

 

Screen Shot 2013-03-09 at 1.11.25 PM

 

  1. Turn on Show Missing Values for Month(Date) to start domain padding. Now we can see the additional column(s) for the month(s) – February in this case between January to the current month that Tableau has added in:

 

Screen Shot 2013-03-09 at 1.14.19 PM

 

With a continuous (green pill) date, this particular set-up won’t work in version 8. Tableau’s domain padding is not triggered when the last value of the measure is Null. I’m hoping this is just an issue with the beta, I’ll revise this section with an update once I find out what’s going on.

Even though the measure data only has end of month dates, instead of using Exact Date for the month I used Month(Date) because of two combined factors: One is that the default import of most date fields from MS Jet sources turns them into DateTime fields, the second is that Show Missing Values won’t work on an Exact Date for a DateTime field, you have to assign an aggregation to a DateTime (even Second will work). This is because domain padding at this level can create an immense number of new rows and cause Tableau to run out of memory, so Tableau keeps the option off unless you want it. Also note that you can turn on Show Missing Values for an Exact Date for a Date Field.

  1. Now for some cleanup steps: for the purposes of this dashboard, filter Period to remove Monthly (we do quarterly reporting), but leave in Null because that’s needed for the domain padding.
  2. Right-click Null on the Color Legend and Hide it. Again, we don’t exclude this because this would cause the extra row for the domain padding to fail.
  3. Set up a relative date filter on the Date field for the last 13 months. This filter works just fine with the domain padding.

Filtering on Unit

Here’s a complicating factor: If we add a filter on Unit, there’s a Null listed here:

 

Screen Shot 2013-03-09 at 1.18.31 PM

I’d just want to see the list of units. But if we filter that Null out, then we lose the domain padding, the last date is now January 2013:

 

Screen Shot 2013-03-09 at 1.18.58 PM

 

One solution here would be to alter the padding to add a padding row for every unit, instead of just one unit. Since Tableau doesn’t let us just hide elements in a filter, and we actually have more reporting units in our data than we are displaying on the dashboards, I chose to use a parameter filter because there are more reporting units in our production data than we are displaying on the dashboards, yet the all-unit rate needs to include all of the data. Setting this up included a parameter with All and each of the units, and a calculated field called “Chosen Unit Filter” with the following formula, that is set to Filter on False:

[Choose Unit] == "All" OR [Choose Unit] == [Unit]

Falls With and Without Injury

In a fantasy world, to create the desired stacked bars I’d be able to drag the Number of Records from the secondary datasource, i.e. the number of fall events, drag an Injury indicator onto the Color Shelf, and be done. However, that runs into the issue of having a finer level of detail in the secondary than in the primary, which I’ll walk through solutions for in the next section. In this case, since there are only two different numbers, the easy way is to generate two separate measures, then use Measure Names/Measure Values to create the stacked bars – Measure Values on Rows, and Measure Names on the Color Shelf. Here’s the basic calculation for Falls with Injury:

SUM(IF [Injury] != "None" THEN 1 ELSE 0 END)

We’re using a row-level calculated field to generate the measure, and a slightly different calc for Falls w/out Injury.

Unit Level of Detail

When we want to blend in Tableau at a finer level of detail and aggregate to a higher level, historically there have been three options:

  • Don’t use blending at all, instead use a query to perform the “blend” outside of Tableau. In the case that there are totally different data sources, this can be more difficult but not impossible by using one of the systems or a different system to create a federated data source, for example by adding your Oracle table as an ODBC connection to your Excel data, then making the query on that. In this case, we don’t have to do that.
  • Use Tableau’s Primary Groups feature “push” the detail from the secondary into the primary data source. This is a really helpful feature, the one drawback is that it’s not dynamic so any time there are new groupings in the secondary it would have to be re-run. Personally, I prefer automating as much as possible so I tend not to use this technique.
  • Set up the view with the needed dimensions in the view – on the Level of Detail Shelf, for example – and then use table calculations to do the aggregation. This is how I’ve typically built this kind of view.

Tableau version 8 adds a fourth option:

  • Tell Tableau what fields to blend on, then bring in your measures from the secondary.

I’ll walk through the table calculation technique, which works the same in version 7 and version 8, and then how to take advantage of v8′s new feature.

Using Table Calculations to Aggregate Blended Data

In order to blend the the falls data at the hospital unit level to make sure that we’re only showing falls for the selected unit(s), the Unit has to be in the view (on the Rows, Columns, or Pages Shelves, or on the Marks Card). Since we don’t actually need to display the Unit, the Level of Detail Shelf is where we’ll put that dimension. However, just adding that to the view leads to a bar for each unit, for example for April 2012 one unit had one fall with injury and another had two, and two units each had two falls without injury.

 

Screen Shot 2013-03-09 at 1.30.27 PM

 

To control things like tooltips (along with performance in some cases), it’s a lot easier to have a single bar for each month/measure. To do that, we turn to a table calculation, here’s the Falls w/Injury for v7 Blend calculated field, set up in the secondary data source:

IF FIRST()==0 THEN
	TOTAL([Falls w/Injury])
END

This table calculation has a Compute Using of Unit, so it partitions on the Month of Date. The IF FIRST()==0 part ensures that there is only one mark per partition. I’m using the TOTAL() aggregation here because it’s easier to set up and maintain. The alternative is to use WINDOW_SUM(), but in Tableau prior to version 7 there are some performance issues, so the calc would be:

IF FIRST()==0 THEN
	WINDOW_SUM(SUM(Falls w/Injury]), 0, IIF(FIRST()==0,LAST(),0))
END

The ,0 IIF(FIRST()==0,LAST(),0 part is necessary in version 7 to optimize performance, you can get rid of that in version 8.

You can also do a table calculation in the primary that accesses fields in the secondary, however TOTAL() can’t be used across blended data sources, so you’d have to use the WINDOW_SUM version.

With a second table calculation for the Falls w/out Injury, now the view can be built, starting with the line chart from above:

  1. Add Measure Names (from the Primary) to Filters Shelf, filter it for a couple of random measures.
  2. Put Measure Values on the Rows Shelf.
  3. Click on the Measure Values pill on Rows to set the Mark Type to Bar.
  4. Drag Measure Names onto the Color Shelf (for the Measure Values marks).
  5. Drag Unit onto the Level of Detail Shelf (for the Measure Values marks).
  6. Switch to the Secondary to put the two Falls for v7 Blend calcs onto the Measure Values Shelf.
  7. Set their Compute Usings to Unit.
  8. Remove the 2 measures chosen in step 1.
  9. Clean up the view – turn on dual axes, move the secondary axis marks to the back, change the axis tick marks to integers, set axis titles, etc.

This is pretty cool, we’re using domain padding to fill in for non-existent data and then having a blend happening at one level of detail while aggregating to another, just for the second axis. Here’s the v7 workbook on Tableau Public:

Patient Falls Dashboard - Click on Image to go to Tableau Public

Patient Falls Dashboard – Click on image above to go to Tableau Public

Tableau Version 8 Blending – Faster, Easier, Better

For version 8, Tableau made it possible to blend data without requiring the linking fields in the view. Here’s how I build the above v7 view in v8:

  1. Add Measure Names (from the Primary) to Filters Shelf, filter it for a couple of random measures.
  2. Put Measure Values on the Rows Shelf.
  3. Click on the Measure Values pill on Rows to set the Mark Type to Bar.
  4. Drag Measure Names onto the Color Shelf (for the Measure Values marks).
  5. Switch to the Secondary and click the chain link icon next to Unit to turn on blending on Unit.
  6. Drag the Falls w/Injury and Falls w/out Injury calcs onto the Measure Values Shelf.
  7. Remove the 2 measures chosen in step 1.
  8. Clean up the view – turn on dual axes, move the secondary axis marks to the back, change the axis tick marks to integers, set axis titles, etc.

The results will be the same as v7.

Next: Tableau’s Data Blending Architecture

—————————————————————-

References:

[1] Kristi Morton, Ross Bunker, Jock Mackinlay, Robert Morton, and Chris Stolte, Dynamic Workload Driven Data Integration in Tableau, University of Washington and Tableau Software, Seattle, Washington, March 2012, http://homes.cs.washington.edu/~kmorton/modi221-mortonA.pdf.

[2] Hans Rosling, Wealth & Health of Nations, Gapminder.org, http://www.gapminder.org/world/.

[3] Jonathan Drummey, Tableau Data Blending, Sparse Data, Multiple Levels of Granularity, and Improvements in Version 8, Drawing with Numbers, March 11, 2013, http://drawingwithnumbers.artisart.org/tableau-data-blending-sparse-data-multiple-levels-of-granularity-and-improvements-in-version-8/.

 


Filed under: Data Blending, Kristi Morton, MicroStrategy, Tableau

An Introduction to Data Blending – Part 5 (Tableau’s Data Blending Architecture)

$
0
0

Readers:

In Part 4 of this series on data blending, we reviewed Tableau’s Data Blending Principles. We also reviewed an example of data blending in Jonathan Drummey’s Patient Falls Dashboard. [3]

Today, in Part 5 of this series, we will peel the onion a bit more and look at Tableau’s Data Blending Architecture.

Again, much of Parts 1 – 5 are based on a research paper written by Kristi Morton from The University of Washington (and others) [1].

You can learn more about Ms. Morton’s research as well as other resources used to create this blog post by referring to the References at the end of the blog post.

Best Regards,

Michael

Integrating Data in Tableau

In Part 5, we discuss in greater detail how data blending works. Then we discuss how a user builds visualizations using data blending using several large datasets involving airline statistics.

Data Blending Architecture

Part 5 - Figure 1

The data blending system, shown in Figure 1 above, takes as input the VizQL query workload generated by the user’s GUI actions and data source schemas, and automatically infers how to query the data sources remotely and combine their results on-the-fly. The system features a two-tier mediator-based architecture in which the VizQL query workload is analyzed and partitioned at runtime based on the corresponding data source fields being used. The primary mediator initiates this process by removing the visual encodings from the VizQL query workload to yield an abstract query. The abstract query is partitioned for further processing by the primary mediator and one or more secondary mediators. The primary mediator creates the mediated schema for the given query workload. It then federates the abstract queries to the primary data source as well as the secondary mediators and their respective data sources. The wrappers compile the abstract queries into concrete SQL or MDX queries and instantiate the semantic mappings between the data sources and the mediated schema for each query. The primary mediator joins all the result sets returned from all data sources to produce the mediated result set used by the rendering system. [1]

Part 5 - Figure 2

Post-aggregate Join

A visualization is organized by its discrete fields into pages, partitions, colors, etc., and like a GROUP BY clause in SQL, these grouping fields comprise the primary key of the visualization. In a blended visualization, the grouping fields from the primary data source become the primary key of the mediated schema. In Figure 2 above, these are shown as the dark-green fields in the primary data source, and the light green fields represent the aggregated data. Each secondary data source must contain at least one field that matches a visualization grouping field in order to blend into the mediated schema. The matching fields in a secondary data source comprise its join key, and fields appear in the GROUP BY clause issued by the secondary mediator wrappers. The aggregated data from the secondary data source, shown in light-purple, is then left-joined along its join key into the mediated result set. Morton (et al) refer to this left-join of aggregated result sets as a post-aggregate join. [1]

Primary Key Cardinality

many mapping between the domain values of the primary key and those of the secondary join key, because the secondary join key is a subset of the primary key and contains only unique values in the aggregated secondary result set. Morton (et al) find that this approach is the most natural for augmenting a visualization with secondary data sources of uncertain value or quality, which is a common scenario for Tableau users.

Data blending supports many-to-one relationships between the primary and each secondary. This can occur when the secondary data source contains coarser-grained data than the mediated result set, as discussed in Part 3 of this series.

Since the join key in a secondary result set may match a subset of the blended result set primary key, portions of the secondary result set may be duplicated across repeated values in the mediated result set. This does not pose risk of double-counting measure values, becaused all aggregation is performed prior to the join. When a blended visualization uses multiple secondary data sources, each secondary join key may match any subset of the primary key. The primary mediator handles duplicating each secondary result set as needed to join with the mediated result set.

Finally, a secondary dimension which is not part of the join key (and thus not a grouping field in the secondary query) can still be used in the visualization. If it is functionally dependent on the join key, a secondary dimension can be used without affecting the result set cardinality. Tableau references this kind of non-grouping dimension using both MIN and MAX aggregations in the query issued to the secondary data source, which allows Tableau to determine if the dimension is functionally dependent on the join key. For each row in the secondary result set, if the two aggregated values are the same then the value is used as-is, reflecting the functional dependence on the grouping fields. If the aggregated values differ, Tableau represents the value using a special form of NULL called ManyValues. This is represented in the visualization as a ‘*’, but retains the behavior of NULL
when used in calculated fields or other computations. The visual feedback allows a user to distinguish this lack of data from the NULLs which occur due to missing or mismatched data.

Inferring Join Keys

Tableau uses very simple rules for automatically detecting candidate join keys:

  1. The secondary data source field name must match a field with the same name in the primary data source.
  2. The data types must match
  3. If they are date/time fields, they must represent the same granularity date bin in the date/time hierarchy, e.g. both are MONTH. A user can intervene to force a match either by providing field captions to rename fields within the Tableau data model, or by explicitly defining a link between fields using a simple user interface.

Part 5 - Figure 3

Another Simple Blending Example

A Tableau data blending scenario is shown in Figure 3 above, which includes multiple views that were composed in minutes by uniquely mashing up four different airline datasets, the largest of which include a 324 million row ticket pricing database and a 140 million row on-time performance database. A user starts by dragging fields from any dataset on to a blank visual canvas, iteratively building a VizQL statement which ultimately produces a visualization. In this example, the user first drags the VizQL fields, YEAR(Flight Date) and AVG(Airfare), from the pricing dataset onto the visual canvas.

Data blending occurs when the user adds fields from a separate dataset to an existing VizQL statement in order to augment their analysis. Tableau assigns the existing dataset to the primary mediator and uses secondary mediators to manage each subsequent dataset added to the VizQL. The mediated schema has a primary key composed of the grouping VizQL fields from the primary dataset (e.g. YEAR(Flight Date)); the remaining fields in the mediated schema are the aggregated VizQL fields from the primary dataset along with the VizQL fields from each secondary dataset.

Continuing our example, the user wishes to drag AVG(Total Cost per Gallon) from the fuel cost dataset to the visualization. The schema matching algorithm examines
the secondary dataset for one or more fields whose name exactly matches a field in the primary key of the mediated schema. While the proposed matches are often sufficient and acceptable, the user can specify an override. Since the fuel cost dataset has a field named Date, the user provides a caption of Flight Date to resolve the schema discrepancy. At this point the mediated schema is created and the VizQL workload is then federated to the wrappers for each dataset. Each wrapper compiles VizQL to SQL or MDX for the given workload, executes the query, and maps the result set into the intermediate form expected by the primary mediator.

The mapping is performed dynamically, since both the VizQL and the data model evolve during a user’s iterative analytical workflow. Finally, the primary mediator
performs a left-join of each secondary result set along the primary key of the mediated schema. In this example, the mediated result set is rendered to produce the visualization shown in Figure 3(a).

Evolved Blending Example

Figure 3(b) above shows further evolution of the analysis of airline datasets, and demonstrates several key points of data blending. First, the user adds a unique ID field named unique carrier from the primary dataset to the VizQL to visualize results for each airline ID over time. The mediated schema adapts by adding this field to its primary key, and the secondary mediator automatically queries the fuel cost dataset at this finer granularity since it too has a field named uniquecarrier. Next, the user decorates the visualization with descriptive airline names for each airline ID by dragging a field named Carrier Name from a lookup table.

This dataset is at a coarser granularity than the existing mediated schema, since it does not represent changes to the carrier name over time. Morton’s (et al) system automatically handles this challenge by allowing the left-join to use a subset of the mediated result set primary key, and replicating the carrier name across the mediated result set. Figure 4 below demonstrates this effect using a tabular view of a portion of the mediated result set, along with portions of the primary and secondary result sets.

The figure also demonstrates how the left-join preserves data for years which have no fuel cost records. Last, the user adds average airline delays from a 140 million row dataset which matches on Flight Date and uniquecarrier. This is a fast operation, since the wrapper performs mapping operations on the relatively small, aggregated result set produced by the remote database. Note that none of these additional analytical tasks required the user to intervene in data integration tasks, allowing their focus to remain on finding insight in the data.

Part 5 - Figure 4

Filtering

Tableau provides several options for filtering data. Data may be filtered based on aggregate conditions, such as excluding including airlines having a low total count of flights. A user can filter aggregate data from the primary and secondary data sources in this fashion, which results in rows being removed from the mediated result set. In contrast, row level filters are only allowed for the primary data source. To improve performance of queries sent to the secondary data sources, Tableau will filter the join keys to exclude values which are not present in the domain of the primary data source result set, since these values would be discarded by the left-join.

Data Cleaning Capabilities

As mentioned in the Inferring Join Keys section above, Tableau supports user intervention in resolving field names when schema matching fails. And once the schemas match and data is blended, the visualization can help provide feedback regarding the validity of the underlying data values and domains. If there are any data inconsistencies, users can provide aliases for a field’s data values which will override the original values in any query results involving that field. The primary mediator performs a left-join using the aliases of the data values, allowing users to blend data despite discrepancies from data entry errors and spelling variations. Tableau provides a simple user interface for editing field aliases. Calculated fields are another aspect of Tableau’s data model which support data cleaning. Calculated fields support arbitrary transformations of original data values into new data values, such as trimming whitespace from a string or constructing a date from an epoch-based integer timestamp.

As with database fields, calculated fields can be used as primary keys or join keys.

Finally, Tableau allows users to organize a field’s related data values into groups. These ad-hoc groups can be used for entity resolution, such as binding multiple variations of business names to a canonical form. Ad-hoc groups also allow constructing coarser-grained structures, such as grouping states into regions. Data blending supports joins between two ad-hoc groups, as well as joins between an ad-hoc group and a string field.

Next: Data Blending Using MicroStrategy

———————————————————————————-

References:

[1] Kristi Morton, Ross Bunker, Jock Mackinlay, Robert Morton, and Chris Stolte, Dynamic Workload Driven Data Integration in Tableau, University of Washington and Tableau Software, Seattle, Washington, March 2012, http://homes.cs.washington.edu/~kmorton/modi221-mortonA.pdf.

[2] Hans Rosling, Wealth & Health of Nations, Gapminder.org, http://www.gapminder.org/world/.

[3] Jonathan Drummey, Tableau Data Blending, Sparse Data, Multiple Levels of Granularity, and Improvements in Version 8, Drawing with Numbers, March 11, 2013, http://drawingwithnumbers.artisart.org/tableau-data-blending-sparse-data-multiple-levels-of-granularity-and-improvements-in-version-8/.

 


Filed under: Analytics, Data Blending, Kristi Morton, MicroStrategy, Tableau, VizQL

An Introduction to Data Blending – Part 6 (Data Blending using MicroStrategy)

$
0
0

Readers:

In Part 5 of this series on data blending, we reviewed Tableau’s Data Blending Architecture. With Part 5, I have wrapped up the Tableau portion of this series.

I am now going to post, over the next week or so, several parts discussing how we do data blending using MicroStrategy. Fortunately, MicroStrategy just publish a nice technical note on their Knowledgebase (TN Key: 46940) [1] discussing this. Most of what I am sharing today is derived from that technical note.

I probably will have 2-4 parts for this topic in my Data Blending series including how the MicroStrategy Analytical Engine deals with multiple datasets.

I want to thank Kristi Morton (et al) for the wonderful research paper she wrote at The University of Washington [2]. It helped me provide some real insight into the topic and mechanics of data blending, particularly with Tableau. You can learn more about Ms. Morton’s research as well as other resources used to create this blog post by referring to the References at the end of the blog post.

So let’s now dig into how MicroStrategy provides us data blending capabilities.

Best Regards,

Michael

Data Blending using MicroStrategy

In Part 6, we will begin examining using data blending in MicroStrategy. We will first look at how to use attributes from multiple datasets in the same Visual Insight dashboard and link them to existing attributes using the Data Blend feature in MicroStrategy Analytics Enterprise Web 9.4.1.

Prior to v9.4.1 of MicroStrategy, data blending was referred to as Cube Joining.

In MicroStrategy Analytics Enterprise Web 9.4.1, the new Report Services Documents Engine automatically links common attributes using the modeled schema whenever possible. The manual linking is not allowed between different modeled attributes. Just in case the requirement needs to link different attributes, this can be done by using MicroStrategy Architect at the schema level. The join behavior by default for linking related attributes is done using a full outer join. In case there is no relationship between the attributes, then a cross join is used.

The manual attribute linking can be done as shown in the images below.

Part 6 - 1

 

2. Browse the file to match the existing data and select Continue.

Part 6 - 2

 

3. Set the attribute forms if needed. MicroStrategy will automatically assign the detected ones.

Part 6 - 3

4. The attributes can be mapped manually by selecting Link to Project Attribute.

Part 6 - 4

5. Select the attribute form that matches the desired join:

Part 6 - 5

6. The attribute should appear similar to the ones existing in the schema as shown below.

Part 6 - 6

 

7. Save the recently created dataset.

Part 6 - 7

8. Now there are two cubes used as datasets in the same Visual Insight dashboard, as shown below.

Part 6 - 7a

Automatic Linking

The attributes icons now have a blue link, as shown below. This indicates that MicroStrategy has automatically linked them to elements in the Information dataset.

Part 6 - 8

Next: How Data Blending Affects the Analytical Engine’s Behavior in MicroStrategy

———————————————————————————-

References:

[1] MicroStrategy Knowledgebase, How to use attributes from multiple datasets in the same Visual Insight dashboard and link them to existing attributes using the Data Blend feature in MicroStrategy Analytics Enterprise Web 9.4.1, TN Key: 46940, 04/24/2014, https://resource.microstrategy.com/support/mainsearch.aspx.

NOTE: You may need to register to view MiroStrategy’s Knowledgebase.

[2] Kristi Morton, Ross Bunker, Jock Mackinlay, Robert Morton, and Chris Stolte, Dynamic Workload Driven Data Integration in Tableau, University of Washington and Tableau Software, Seattle, Washington, March 2012, http://homes.cs.washington.edu/~kmorton/modi221-mortonA.pdf.


Filed under: Data Blending, MicroStrategy, Tableau

How Data Blending Affects the Analytical Engine’s Behavior in MicroStrategy (Part 7)

$
0
0

MicroStrategy Analytics PlatformWith the release of MicroStrategy Analytics Enterprise 9.4.1, the Analytical Engine logic has been enhanced with respect to joining data from multiple datasets in a Report Services Document. One of the features that is available with this release is the ability to use objects (e.g., attributes, metrics) from multiple datasets in a single grid in a document.

If an attribute on a grid has elements that can be obtained from multiple datasets used in the document, the elements displayed will be from the global lookup table. Additionally, if one or more of the datasets containing the attribute has missing attribute form data or has different attribute form from the other datasets, the Analytical Engine will follow the rules noted below to compose the final output:

Rule 1:

If there is attribute form with null value, the Analytical Engine will use the non-null form value from other datasets instead of the null form.

Rule 2:

If several datasets have different attribute form information for the attribute element, the Analytical Engine will use the attribute form from the biggest dataset.

Rule 3:

If several datasets have different attribute form information for the attribute element, and those datasets have same number of rows, the Analytical Engine will use the first dataset in the document for the attribute form value (according to the dataset adding sequence).

NOTE: Users should note that the rules are applied for each individual attribute element in the result at the row level rather than at the dataset level.

Example 1:

Users may consider the following datasets – C01 is a dataset with Customer City, Customer and Order:

Part 7 - 1a

 

C02 is a dataset with Customer, Order and a profit metric. Users may note that the Customer attribute is missing the DESC form in the second dataset:

Part 7 - 2a

If a Report Services Document is built with both these datasets, and the attributes are placed on a grid, the following results may be seen. As noted in Rule 1, the Analytical Engine will display the non-Null values from C01 for the Customer attribute elements:

Part 7 - 3a

Example 2:

Now users may consider a different dataset as C02 – similar to the initial dataset, but here the Customer name (DESC) form contains values instead of NULLs. This time the values for the attributes are not consistent – see that Customer ID ’1′ has different values for the DESC form for different Orders (1 & 6).

Customer Name Customer ID Order Profit
Customer D 1 1 100
Customer B 2 2 200
Customer C 3 3 300
Xia D 4 4 400
Kris Du 5 5 500
Customer A 1 6 610
Customer E 2 7 720
Customer F 6 8 860
Customer G 7 9 970
Customer H 8 10 1080

If a report is built for this dataset users will observe that the first attribute element value in the dataset is used as as the DESC form for the Orders 1 & 6 even if the value is different in subsequent rows (this is the same as previous Analytical Engine behavior). Part 7 - 5

When these datasets are used in the grid in a Report Services Document, the Analytical Engine will choose the attribute element values from dataset C02 to display in the attribute element values from. This is because of Rule 2 explained above.

Part 7 - 6a

Example 3:

Consider the following dataset:

Customer Name Customer ID Order Profit
Customer D 1 1 100
Customer E 2 7 720
Xia D 4 4 400
Kris Du 5 5 500
Customer G 7 9 970

A report built off this dataset appears as follows:

Part 7 - 8a

After replacing the dataset ‘C02‘ from the previous example with the new dataset, the following results are seen. As noted in Rule 3, because both C01 an C02 have the same number of rows, the elements displayed for the Customer attribute will be filled from from the first dataset to be added to the document – in this case C01. However for the first row in the results, where there is no corresponding customer in the dataset C01, Rule 1 will be applied and instead of a NULL value, the non-null Customer Name field ‘Customer G’ is picked from C02. (Rules are applied at the individual element level).

Part 7 - 9a

Next: Why are some metric values blank in documents using multiple datasets in MicroStrategy Analytics Enterprise 9.4.1

———————————————————————————-

References:

[1] MicroStrategy Knowledgebase, Engine behavior for grids on a Report Services Document or dashboard with multiple datasets where some attribute forms are missing or have different values the datasets in MicroStrategy Analytics Enterprise 9.4.1 and newer releases, TN Key: 45463, 03/13/2014, https://resource.microstrategy.com/support/mainsearch.aspx.

NOTE: You may need to register to view MicroStrategy’s Knowledgebase.


Filed under: Data Blending, Data Visualization, MicroStrategy

Data Blending: Why are Some Metric Values Blank in Documents Using Multiple Datasets in MicroStrategy Analytics Enterprise 9.4.1 (Part 8)

$
0
0

MicroStrategy Analytics Enterprise

Introduction

Starting with MicroStrategy Analytics Enterprise 9.4.1, Report Services documents can contain grids with objects coming from more than one dataset.

Multiple Datasets in a Single Grid/Graph/Widget Object in MicroStrategy Web 9.4 [2]

Users now have the ability to add attributes and/or metrics from multiple datasets to a single grid, graph, or widget. For example, if Dataset #1 contains Category and Revenue and Dataset #2 contains Category and Profit, a grid can be created which contains Category, Revenue, and Profit.

Part 8 - 1

Administrators can control the use of multiple datasets in a single grid, graph, or widget through the Analytical Engine VLDB properties window at the project level.

  1. Right mouse click (RMC) on the project name.
  2. Select Project Configuration.
  3. Click on Project Definition.
  4. Select ‘Advanced’.
  5. Click “Configure” under the Analytical engine VLDB.

Part 8 - 2

NOTE:

The default value is set to: “Objects in document grids must come from the grid’s source dataset only”.

Users can set the set the source of the grid to a particular dataset or choose no dataset (in which case, the MicroStrategy engine will determine the best suited dataset). [1]

The MicroStrategy Analytical Engine displays no data for metrics in ambiguous cases or when there is a conflict. Ambiguous cases can arise in cases where multiple datasets contain the same objects. Examples based on the MicroStrategy Tutorial project have been provided to explain this information.

Note: When the MicroStrategy Analytical Engine cannot resolve the correct datatset as explained in the cases below, the data displayed for these will correspond to the value chosen for the missing object display under Project Configuration > Report definition > Null values > Missing Object Display. The default value for this blank.

Case1:

Multiple datasets have the same metric. Only one dataset does not contain this metric and this dataset is set as the source of the grid.

This case is explained with an example based on the MicroStrategy Tutorial project.

1. Create the following objects:

a. Dataset DS1 with the attribute ‘Year’ and metric ‘Profit’.

b. Dataset DS2 with the attribute ‘Year’ and metrics ‘Profit’, ‘Revenue’.

c. Dataset DS3 with the attribute ‘Quarter’ and metric ‘Cost’.

2. Create a document based on the above datasets and create a grid object on the document with the following objects: ‘Year’, ‘Quarter’, ‘Profit’. Set the source of this grid to be the dataset ‘DS3′.

3. In the executed document, no data is displayed for the metric ‘Profit’ as shown below.

Part 8 - 3

In the above example, the metric ‘Profit’ does not exist in the source dataset ‘DS3′ and exists in more than dataset which are in the document i.e., it exists in both ‘DS1′ and ‘DS2′. Since the engine cannot just randomly pick one of the two available datasets, it chooses not to display any data for this metric. If users do not want such blank columns to be displayed, set the source dataset so that such ambiguity does not arise.

Case 2:

The same metric exists multiple times on the grid. For example, users can have a smart compound metric and a component metric of this compound smart metric on the grid in the document. The smart metric and the component metric are from different datasets.

This case is also explained with an example based on the MicroStrategy Tutorial project.

1. Create the following objects:

a. Dataset DS1 with attribute ‘Year’ and metric ‘Profit’.

b. Dataset DS2 with attribute ‘Year’ and metrics ‘Revenue’, ‘Profit’, ‘Profit Margin’ (this is a compound smart metric calculated from metrics Revenue and Profit).

2. Create a document based on the above datasets and create a grid object on the document with the following objects: ‘Year’, ‘Revenue’, ‘Profit’ and ‘Profit Margin’. The source of this grid object is set to DS1.

3. In the executed document, no data is displayed for the metric ‘Profit Margin’, as shown below.

Part 8 - 4

In the above example, since the source of the dataset is set to ‘DS1′, the ‘Profit’ metric is sourced from this dataset and the metric ‘Revenue’ is sourced from the dataset ‘DS2′ (as this is the ONLY datatset with this metric). However, for the metric ‘Profit Margin’, the component metric ‘Profit’ exists on dataset ‘DS1′, so this becomes a conflict metric and is not displayed. If the source of the grid is changed to ‘DS2′, the data is displayed correctly as shown below.

Part 8 - 5

References:

[1] MicroStrategy Knowledgebase, Why are some metric values blank in documents using multiple datasets in MicroStrategy Analytics Enterprise 9.4.1, TN Key: 44517, 12/16/2013, https://resource.microstrategy.com/support/mainsearch.aspx.

[2] MicroStrategy Knowledgebase, Multiple datasets in a single grid/graph/widget object in MicroStrategy Web 9.4, TN Key: 44944, 09/30/2013, https://resource.microstrategy.com/support/mainsearch.aspx.

NOTE: You may need to register to view MicroStrategy’s Knowledgebase.


Filed under: Data Blending, MicroStrategy

Bryan’s BI Blog: MicroStrategy vs Tableau

Bryan Brandow: Triggering Cubes & Extracts using Tableau or MicroStrategy

$
0
0

trigger-720x340

bryan-headshots-004Bryan Brandow (photo, right), a Data Engineering Manager for a large social media company, is one of my favorite bloggers out their in regards to thought leadership and digging deep into the technical aspects of Tableau and MicroStrategy. Bryan just blogged about triggering cubes and extracts on his blog. Here is a brief synopsis.

One of the functions that never seems to be included in BI tools is an easy way to kick off an application cache job once your ETL is finished.  MicroStrategy’s Cubes and Tableau’s Extracts both rely on manual or time based refresh schedules, but this leaves you in a position where your data will land in the database and you’ll either have a large gap before the dashboard is updated or you’ll be refreshing constantly and wasting lots of system resources.  They both come with command line tools for kicking off a refresh, but then it’s up to you to figure out how to link your ETL jobs to call these commands.  What follows is a solution that works in my environment and will probably work for yours as well.  There are of course a lot of ways for your ETL tool to tell your BI tool that it’s time to refresh a cache, but this is my take on it.  You won’t find a download-and-install software package here since everyone’s environment is different, but you will find ample blueprints and examples for how to build your own for your platform and for whatever BI tool you use (from what I’ve observed, this setup is fairly common).  Trigger was first demoed at the Tableau Conference 2014.  You can jump to the Trigger demo here.

I recommend you click on the link above and give his blog post a full read. It is well worth it.

Best regards,

Michael


Filed under: Bryan Brandow, Bryan's BI Blog, ETL, MicroStrategy, Tableau, Triggers

Cognos, RAVE and D3: An Interview with Cognos Paul

$
0
0

Paul Mendelson HeadshotWithin the worldwide Cognos community, when you ask someone who to turn to about some special trick or complex feature you need to implement, the first name that comes out is Cognos Paul (Photo, right).

Paul Mendelson, aka Cognos Paul, is a certified freelance Cognos developer. He has been working, tinkering, and playing with Cognos since 2007.

For most of his professional career, Paul has consulted on projects from a wide array of companies. While sometimes difficult, especially as a project comes to a close, this has given him the opportunity to learn from a wide range of methodologies spanning many industries. Paul’s clients have included banks, pharmaceutical companies, government and military organizations, institutions dealing in manufacturing, logistics, insurance, and telecoms (the list goes on). Without the opportunities of working for these clients, Paul feels he would not know half of the techniques half as well as he should, and would like Cognos half as much as it deserves.

If you have a challenging Cognos question and are seeking help, you can contact Paul at cognospaul@gmail.com and can come to an arrangement.

1. IBM recently released its latest version of Cognos (rebranded as Cognos Analytics). Can you tell us your thoughts on the new release and also how Watson Analytics plays a part in it?

Cognos Paul: The new version seems heavily self-service centric, with the advanced dashboarding tools and the various improvements to Workspace Advanced. The most exciting part of it is the expanded data sources used to power the new dashboards. It should make self-service dashboards must easier for users to build. The caveat is that, as usual, it is a complex tool. Users will absolutely need to be trained, or Cognos IT group will be swamped with issues.

Watson is a drastically new direction. While I haven’t played with it much myself, it seems to make statistical analysis open to non-statisticians. I still have some reservations, but I’m looking forward to seeing more.

2. From a data visualization perspective, why would I want to consider Cognos Analytics versus, say, a Tableau, Microsoft Power BI or MicroStrategy?

Cognos Paul: Data visualization is actually one of Cognos’s historically weak spots (although they are working on it, with RAVE). I believe Tableau still maintains the standard of advanced dataviz capabilities. That being said, the other capabilities offered by Cognos more than makes up for it.

The flexibility of report design and ways end users can consume the reports is without compare.

Cognos Analytics

3. MicroStrategy recently released their v10.3 which offers an integration to a D3.js library. Is Cognos Analytics doing anything similar?

Cognos Paul: Several years ago, IBM released a tool cared Rapidly Adaptive Visualization Engine, or RAVE. Using a declarative language, the author is able to easily and quickly build very advanced graphs. Users can define the graph to modify shape, size, color, opacity, based on any elements in the data set.

Admittedly, RAVE doesn’t offer the complexity of D3, which is why D3 is integrated in one of the upcoming versions. From what I understand, RAVE will be able to use almost any publicly available D3 library.

4. What advice would you give a developer who is new to Cognos?

Cognos Paul: Taking a Framework Manager (FM) course is absolutely necessary. The best practices for framework model development make sense, and deviating from them can cause performance issues.

Report development does take some time to get the hang of. There are always multiple ways of doing things, and if you’re working hard, you’re doing something wrong. Try multiple things and don’t be afraid to ask questions.  Google Search is your friend, if you’re having a problem with something, then chances are good that other people have as well.

Most important, don’t be afraid to try new things. There are many things Cognos can do, and many associated tools. If something isn’t working one way, there is a very good chance it will work another.

 


Filed under: Cognos Paul, Dataviz, IBM Cognos, IBM RAVE, Microsoft Power BI, MicroStrategy, Tableau, Uncategorized
Viewing all 23 articles
Browse latest View live