Apr
14

Microsoft Gold Business Intelligence and Data Platform Competencies

Once again, Microsoft have awarded altius with their gold level competency in Business Intelligence and Data Platform. This award signifies altius has demonstrated their best-in-class capabilities within these solution areas.

 

Our complete list of competencies are:

  • Gold Business Intelligence
  • Gold Data Platform
  • Silver Application Development
  • Silver Midmarket Solution Provider

 

Partner logo

Share:
  • Print
  • Twitter
  • LinkedIn
  • Facebook
  • Google Bookmarks
  • del.icio.us
  • StumbleUpon
  • Digg

Apr
02

Microsoft UK Financial Services Industry Lead to present at Altius Business Intelligence Strategy Event

Bruce McKee, Microsoft UK Financial Services Industry Lead, will be the keynote speaker at Altius’ forthcoming industry briefing event on Business Intelligence Strategy.  Attendees at the event, to be held at the Grand Hotel on the 30th April, will hear from:
  • David Kilpatrick, Altius Information Strategist, who will demonstrate how and why a strategy for business intelligence should be a key component for any organisation wishing to innovate and fully exploit all the information they have at their disposal; and 
  • Bruce McKee, Microsoft UK Financial Services Industry Lead, who will discuss the Microsoft view on how the advent of trends such as Cloud, Social, Mobile and Big Data represent even greater opportunity to drive further innovation and achieve greater business insight.
John Gamble, Practice Lead for Altius Channel Islands, said: “Business Intelligence projects can no longer be approached in isolation, they should be aligned to support strategic goals and objectives. A well-defined BI strategy is crucial to gain the maximum value from data. Our event is intended to highlight this, and to that end I’m absolutely delighted to have both Bruce and David attending. David has over 20 years experience helping companies define and execute BI Strategies and Bruce is very highly regarded within the Financial Services Technology industry.
 
It’s great to see such a wide variety of Channel Islands based Financial Services companies already registered to attend the event.
 
For more information or to register for this event, please contact john.gamble@altius.je or steve.waterman@altius.je
Share:
  • Print
  • Twitter
  • LinkedIn
  • Facebook
  • Google Bookmarks
  • del.icio.us
  • StumbleUpon
  • Digg

Mar
20

Altius assists in European implementation of the new version of Temenos Insight R13

Altius, the Channel Islands’ specialist Business Intelligence company, has begun their first European implementation of the new release of Temenos’ Business Intelligence product known as ‘Insight’, for an International Private Bank. This builds upon Altius’ experience of implementing the previous version of the product in the UK and Channel Islands.  The new version of Insight, known as R13, contains many new reporting and analytical features whilst retaining the tight integration with the T24 banking system.

Charles Robertson, Altius – Insight Project Lead, said: “Insight has received significant improvements in R13, including a SharePoint user interface, new analysis cubes and greater user configurability. It is a powerful tool for getting the most out of the data in your T24 system.”

John Gamble, Channel Islands Practice Lead, said: “I’m delighted that members of the Altius Channel Islands BI team have been able to assist in the implementation of Insight R13. I’m especially delighted that we’ve got to work with the new version and we look forward to helping more T24 customers get the most from their Business Intelligence and reporting systems.”

Share:
  • Print
  • Twitter
  • LinkedIn
  • Facebook
  • Google Bookmarks
  • del.icio.us
  • StumbleUpon
  • Digg

Mar
03

Gartner’s Magic Quadrant for Business Intelligence and Analytics Platforms 2014

As some of you may be aware, Gartner just released the 2014 version of their BI and Analytics Magic Quadrant. John did an article on their previous Magic Quadrants (found here), but his was more focused on comparing how vendors have changed position in the quadrant over time and on trying out the Tableau visualisation software.

Given that Gartner themselves state that this year that sort of approach is “not particularly useful” due to deliberate goalpost-shifting on their part, simply commenting on the difference between last year’s quadrant and this year’s was out. Instead, this blog aims to provide a summary of actual report (found here), which at 35,082 words and over 55 pages is a fairly hefty slog! The main points contained within this blog are:

  • Market definition
  • Inclusion criteria
  • Evaluation criteria
  • Segments
  • Market overview and trends seen
  • A look at some of the more interesting companies featured (this section only features the 4 companies that I found interesting, you may find others more noteworthy)

Anyway, a quick (relative to the 5,000 words the report devotes to the topic!) reminder of how a Magic Quadrant works:

 

Market definition

First, Gartner define what exactly a BI and Analytics platform is. Their definition can be roughly paraphrased as any platform that delivers specified capabilities (full list in the report) across three categories:

  • Information delivery
  • Analysis
  • BI integration

 

Inclusion Criteria

Once Gartner settle on a definition, they then decide on which BI/Analytics vendors to include in the Magic Quadrant. Their inclusion criteria are:

  • Generates more than $15 million in total BI-related software license revenue annually.
  • If a vendor also sells transactional applications, they must demonstrate that their BI platform is used by customers who do not use their transactional applications.
  • Delivers at least 12 out of the 17 capabilities mentioned in Gartner’s definition of a BI and Analytics platform.
  • Collected more than 30 survey responses from customers that use its platform in a production environment.

After laying out these inclusion criteria, Gartner conduct fairly exhaustive client-based research (for full details, see here) and evaluate the included companies’ performance in various categories (these are fairly self-descriptive but, as ever, there’s a more detailed explanation in the actual report!) to give a weighted score for “Ability to Execute” and “Completeness of Vision”.

Whilst “Ability to Execute” is a fairly objective measure, “Completeness of Vision” is a somewhat more subjective measure. This year, Gartner looked for:

  • Different approaches to pervasive, governed use.
    • This is covered in more detail in the “Market Overview/Trends section later
  • A variety of deployment models.
    • Integration of BI platforms with a variety of cloud based datasources, mobile BI platforms and successfully embedding BI functionality in existing business processes were all assessed
  • Different types of data sources and analyses.
    • Given that the largest growth areas for data generation are real-time event streams emitted by sensors, machines and transactional systems (“The internet of things”), the ability to perform interesting and insightful analyses on these varied datasources was scored highly.

 

“Ability to Execute” is scored on:                                  “Completeness of Vision” is scored on:

These results are then plotted on a graph and split into 4 sections – niche players, challengers, leaders and visionaries – to give you the Magic Quadrant seen above.

 

Segments

Niche players tend to do well in a specific area of the market, but can either lack broader platform functionality or, if a broader platform exists, lack implementation and support capabilities.

Challengers are vendors that are well positioned to succeed in the near future and have a high ability to execute, but currently lack the marketing efforts, sales channel, geographic presence, industry-specific content or awareness of the vendors in the Leaders quadrant.

Leaders are vendors with high scores in both categories – they tend to be able to deliver enterprise wide implementations of a wide range of BI platform capabilities on a global basis. Whilst this sector is dominated by megavendors such as Microsoft, Oracle and IBM, smaller vendors still feature in this sector if they combine comprehensive market understanding, capabilities and road maps with excellent execution and high client satisfaction.

Visionaries have a strong and unique vision for delivering a BI platform. They offer great depth of functionality in the areas they address, but may have gaps relating to broader functionality requirements or there may be concerns about their ability to grow and provide consistent execution.

 

Market Overview/Trends

Governed Data Discovery – Following on from last year’s Magic Quadrant, Gartner has shifted the emphasis slightly from data discovery to what they are terming “governed data discovery”. This phrase, which is repeated frequently enough throughout the report to be deemed a buzzword, is being used to describe platforms which combine the flexible, easy-to-use, self-service and ad hoc elements of data discovery tools with the requirements of enterprise IT such as governance, scalability, ease of administration and security.

 According to Gartner’s research, allowing clients to control and govern enterprise-scale deployments is key if these new data discovery-centric platforms are to replace, rather than complement, the current IT-centric model. A perfect example of this is Tableau, which despite being ranked very highly by business users for both ease of use and performance is considered the BI standard by fewer than half their customers, largely due to the lack of enterprise features.

Currently, the megavendors tend to score highly on the enterprise IT side of things, but lack data discovery platforms as potent or popular as those sold by vendors such as Tableau or Qlik. Meanwhile, the vendors specialising in data discovery tend to lack the ability to execute enterprise-wide implementations of their platforms, resulting in an empty middle ground that all current leaders are racing towards – whether that be by attempting to integrate data discovery platforms into their current enterprise offerings (as Microsoft are doing with PowerBI) or through a more radical shift in focus such as that seen in Qlik (who are releasing a completely re-architectured product) and SAS, whose new “Visual Analytics” platform is aimed much more at business users rather than their previous market of data scientists, power users and IT-centric BI developers.

As a result of this change in emphasis, the entire “Leaders” quadrant has been shifted left – if nobody is filling this segment of the market, then their completeness of vision is evidently lacking!

Self-service data integration – one of the other major challenges facing current vendors is providing a platform that allows more flexible and user-driven data preparation. When combined with automatic discovery and highlighting of patterns and findings in data, recent advances which make information access and data modelling much easier and faster will lead to “smart” data discovery (with IBM’s Watson Analytics already aiming for this) and preparation.

This new technology, Gartner predicts, could bring sophisticated analyses to current non-consumers of BI, a potentially massive user base given that currently approximately 70% of employees within respondent organisations either have no involvement with BI at all or have no statistical background. The aim is to “make hard types of analysis easy”, thereby enabling better decision making at all levels.

Cloud BI – over the last year or so, cloud-based solutions have moved into the mainstream, with 45% of respondents saying that they would trust cloud-based solutions for their mission-critical BI.

Advanced Analytics – predictive and prescriptive analytics are now becoming a large and important field of their own, to the extent that they have been removed from the BI and Analytics Magic Quadrant and given a Magic Quadrant of their own.

 

Interesting respondents

New entrants – there are three first-time respondents this year:

  • Pyramid Analytics offer a web-based BI platform, BI Office, which is based on the Microsoft BI stack. It offers the full range of analytic capabilities, but its primary focus is more complex, in-memory and OLAP-based analysis and data visualization. They score strongly in geospatial and OLAP capabilities, but lowly in most other areas and are considered overly dependent on the Microsoft stack.
  • Infor’s business intelligence platform is part of their end-to-end platform that encompasses BI and performance management capabilities, both based on the MIS Alea product acquired from Systems Union in 2006. The overall impression from Gartner’s research is that they currently suffer from relatively poor market understanding (reasonable, given their recent entry to the market) and platform performance but have a comprehensive and aggressive roadmap to improvement, with Gartner predicting highly improved scores next year.
  • Yellowfin also offer an end-to-end BI and data integration platform which focuses more heavily on user-friendliness and the social aspect of BI. Standout features include an enhanced storyboard, a fully integrated and interactive PowerPoint-like presentation and collaboration module, and a unique new timeline feature that records a user’s specific activities and interactions in real time. Their main challenges were assessed to be a limited presence outside of Asia and issues with product quality

Significant changes – including both some bold developments by Leaders and the only two companies to change quadrant:

  • IBM – One of the main megavendors, IBM’s most interesting development this year is the announcement of their Watson Analytics platform, due to be released in 2014. This platform allows business users with little or no technical/statistical knowledge to analyse data using “smart data discovery” technology. Watson Analytics is set to be a cloud-based tool which uses natural language querying to access datsets, correlate information and come to conclusions which are then presented in a digestible fashion to the business user. Gartner wax lyrical about the potential of this platform, going so far as to call it the “discovery tool that may transform the paradigm of how information is used in organizations”
  • Qlik – as previously mentioned, Qlik are planning to release QlikView.Next, a completely re-architectured product that features the redesigned Natural Analytics platform. Natural Analytics “builds on the company’s associative search capability and incorporates enhanced comparisons, collaboration, workflow, sharing and data dialogs” and also offers unique visualisation techniques following Qlik’s acquisition of NComVA. In addition to Qlik’s traditionally strong business-user experience, QlikView.Next also offers a re-architectured enterprise server and admin capabilities, placing Qlik squarely as one of the main contenders to reach the currently elusive middle ground previously discussed.
  • Alteryx – one of the two companies changing segments – in this case, from “Niche Player” to “Visionary” – Alteryx specialise in ad hoc analyses and performing advanced analytics on location data from providers like D&B, Experian, the U.S. Census Bureau and TomTom. They received the highest capability and use scores for ad hoc reporting and querying, geospatial and location intelligence and the research reveals that they deliver business benefits in the top quartile. Their move into the “Visionary” comes as a result of very strong scores for innovation, market understanding and product strategy. The strength of their advanced analysis capabilities are such that they also feature in the new Magic Quadrant created for that sector.
  • Panorama Software – their exceptional customer results, their unique, native social and collaborative-based data discovery experience and their top-ranking market understanding have led to a move from the “Niche Player” segment to the “Visionary” segment for Panorama this year. Their main product, Panorama Necto, has strong OLAP capabilities, allowing the customers to use the product to conduct more complex types of analysis than most other vendors in the survey. However, unlike most other OLAP front-end tools Necto also offers these analyses within a social and collaboration-based guided data discovery user experience make it one of the vendors best placed to fill the “governed data discovery” gap in the BI market.

Whilst this “brief” overview ended up being not quite as brief as I’d intended, hopefully it provides a decent summary of the most salient points in the report. If you’ve got the time, I’d recommend reading the actual document, as it’s pretty interesting stuff. The main take away point is the pending importance of governed data discovery, but it also features Gartner’s thoughts on numerous other aspects of the BI sector’s future.

Cheers for reading,

Matt

 

 

Share:
  • Print
  • Twitter
  • LinkedIn
  • Facebook
  • Google Bookmarks
  • del.icio.us
  • StumbleUpon
  • Digg

Nov
28

BCS Jersey event: Big Data In Jersey

Many thanks to everyone who attended my presentation this lunchtime on Big Data in Jersey with the Jersey BCS.  Great to see a sold-out event.  We hope you enjoyed it and saw how it is possible to perform analysis and gain insight on some very unstructured data.  For everyone who was there, please do use the USB memory sticks – there’s 4Gb of space on them, so they should be useful, and I would also encourage you to use the Excel workbook on it to perform your own analysis of the data set.  It’s publically available in the Windows Azure Cloud and please feel free to query away.  Let us know if you have any problems.

The data has now been updated, and has transcripts up to the 5th November 2013. There are also now some visualisations available. Here are the links:

States of Jersey Hansard Visualisation Handout (PDF)

Excel workbook with data connections and example pivot tables.

Two-word phrase clouds

Word clouds

Blog Posts

Slide deck (2.6Mb PDF)

If you have any questions, comments or feedback, please feel free to contact me: charles [dot] robertson [at] altius [dot] je or @charles_jsy

Share:
  • Print
  • Twitter
  • LinkedIn
  • Facebook
  • Google Bookmarks
  • del.icio.us
  • StumbleUpon
  • Digg

Sep
26

Kerberos Delegation 101

Twice over the past couple of weeks I’ve been asked by two separate people, what I knew about Kerberos delegation? Truthfully I replied “Not a lot. That’s usually been put in place already by whomever configured the server infrastructure and user accounts”. Luckily for me, this time it was the case, but it piqued my interested enough for me to spend some gaining a better understanding of an authentication protocol that I have only ever really taken for granted.

Even though there is a vast amount about this topic already on the net I still struggled collate it into something small and digestible. I hope this short(ish) blog will help someone else out who wants to just be able to “pick it up” and get things working… hopefully!


Who needs Kerberos?

I won’t get too much into what Kerberos is etc. you can easily go and check that out on the Wikipedia, but what does it allow us to do, and why do we sometimes need it?

While often considered to be more secure, and place less load on domain controllers etc., the main reason why it’s necessary to use this authentication is when credentials of an already authenticated user needs to get passed around from one machine to another. In the BI world this is typically something as simple as apply row level security to data that is being surfaced via an intermediary. This could be Reporting Services, or a website such as SharePoint perhaps.

In this simple diagram, the “Client” is logged into the “Presentation” server, perhaps even over the internet. But then the “Presentation” server is surfacing some data to that user. The trick comes when the data being surfaced needs to be restricted by the original user’s credentials.

The key things to consider are that the:

  • “Presentation Layer” contains an account that is allowed to delegate
  • “Data Sources” contain definitions of services that can be delegated to via a defined Service Principal Name

To set this all up is actually not the horrible undertaking that many people think it to be. We can simply step through the following tasks.

  1. Identify data sources – Create SPNs for each one.
  2. Enable delegation from the “Presentation” server service account(s)
  3. Assign which SPNs the “Presentation” service account(s) can delegate to.

Easy! ;)


Assumptions

First off, the following points rely on a couple of assumptions with regards to the infrastructure in use.

I’ve assumed a Microsoft environment, with MS SQL Server as a back end. This is what I personally have utilised in the past, but Kerberos delegation is by no means limited to this. The information below should stand you in good stead to working out the modifications required for non-SQL data sources.

Secondly, the domain and server versions are assumed to be modern-ish, in that they are at least 2008 edition plus. Although all the principles are the same for earlier versions of both the domain and SQL some additional work is required which won’t be covered, but will be pointed out so Google is your friend there.


Create Service Principal Names

Ok, so the first step is to create Service Principal Names (SPNs) for each service that will need to be delegated to. This is in essence a directory of services that will ultimately be assigned to services running on the presentation server so that credentials can be ‘handed-off’ successfully.

An SPN is created with a PowerShell command constructed as follows:

setspn.exe -S <SPN> <AccountName>

This is simple enough, but when you start getting into how to actually construct these arguments it can get tricky. More information about the setspn.exe here.

  • SPN = <service class>/<NetBIOS | FQDN>[:<port | instance>]
  • AccountName = AD\ServiceAccount

Ok, perhaps the AccountName portion isn’t so tricky. It is after all just the domain account that runs the services that you are creating the SPN for. It is best practice in MS SQL land to run these services as domain accounts, but not a requirements. In that scenario, just use the host name instead of the user account.

But what about service class? As a quick starter for 10, if a SQL Server data source was running as a default instance on a machine named SQL-SERVER-01 under the user credentials of DOMAIN\SQL01-Svc the syntax would look like this:

setspn.exe -S MSSQLSvc/SQL-SERVER-01 DOMAIN\SQL01-Svc

The service class for this is before the first slash (MSSQLSvc), and a typical list of them for a Microsoft SQL implementation environment might look like this:

Service   Class Description Port Notes
MSSQLSvc SQL Server DB Engine 1433
MSOLAPSvc.3 SQL Server Analysis Services 2383
HTTP IIS Service 80 Also for SSL apparently (untested)

 

There are some other gotchas here too, but instead of reeling them off in a paragraph that is a bit tricky to understand I’ll just throw out some examples all on machine name SQL-SERVER-01 with the service running under the credentials of DOMAIN\SQL01-Svc

-- Default SQL Server Instance

setspn.exe -S MSSQLSvc/SQL-SERVER-01 DOMAIN\SQL01-Svc

setspn.exe -S MSSQLSvc/SQL-SERVER-01.Domain.Full.Path DOMAIN\SQL01-Svc

-- SQL Server Named Instance – just specify the port

setspn.exe -S MSSQLSvc/SQL-SERVER-01:123456 DOMAIN\SQL01-Svc

setspn.exe -S MSSQLSvc/SQL-SERVER-01.Domain.Full.Path:123456 DOMAIN\SQL01-Svc

-- SSAS Default Instance

setspn.exe -S MSOLAPSvc.3/SQL-SERVER-01 DOMAIN\SQL01-Svc

setspn.exe -S MSOLAPSvc.3/SQL-SERVER-01.Domain.Full.Path DOMAIN\SQL01-Svc

-- SSAS Named Instance – add the name of the instance, i.e. “Tabular”

setspn.exe -S MSOLAPSvc.3/SQL-SERVER-01:Tabular DOMAIN\SQL01-Svc

setspn.exe -S MSOLAPSvc.3/SQL-SERVER-01.Domain.Full.Path:Tabular DOMAIN\SQL01-Svc

 

Note: Here I have added a NetBIOS and FQDN version of each command. It’s probably a good idea to run both commands as some applications might form one type of SPN, and others another. Although in my investigations it is most often the FQDN version.

You can also, if you have it all set up, use DNS records to identify servers, but make sure these are “A” records as CNAMEs just won’t work. This is part of the Kerberos specification.


Enable Delegation

Once the SPN records have been created within Active Directory, the service accounts that will delegate from need to be enabled for this purpose. Luckily this is quite straight forward, and is actually just a case of creating an empty SPN record within AD for each of these services.

For example, if there was a service that ran SharePoint Excel services on the presentation server named DOMAIN\Excel-Svc, a sample SPN might look something like this:

setspn.exe -S DUMMY/Excel-Svc DOMAIN\Excel-Svc

The “DUMMY/Excel-Svc” portion can actually be anything you like, but this seems sensible to me.

The screen shot below highlights the single reason that this step is required, in order to ensure that the following tab is visible in the AD properties of that user account.


Adding SPNs to Service Accounts

Once the delegation tab is enabled on each of the service accounts that will need to be delegated from, it’s time to tell them which services that they can delegate to.

On the delegation tab, choose the “Add…” button

Then enter the name of the service account that you wish to delegate to, and search the AD for this

The resulting dialog (not shown) will list all defined SPNs attached to that user account, and allow you to choose the ones that you want to be able to delegate to. Here I have selected the default instances of SQL Server and Analysis Services.

Choose OK, and then we are all done!!

Note: In earlier than 2008 Active Directory integration levels this searching mechanism may not be available. I’d hope people aren’t still using integration levels that early, but if they are… get to the googling!


Wrap Up & Verification

The only thing really left to do is to verify that everything works! The best method that I can see of verification is to fire up SQL profiler and using just default settings get it to look at incoming requests for the target data sources when they are being accessed from the presentation later.

All being well, you should be able to clearly see the username of whomever you are logged in as from a client machine attempting to access the data source(s). From there on it is just standard security access permissions to contend with.

That’s all the setup done, I strongly encourage the planning on SPNs and service names in detail long before you actually start getting to the configuration as it will save you a lot of potential pain. There is a very good document from Microsoft entitled “Microsoft BI Authentication and Identity Delegation” that is strongly recommended bedtime reading.

Kerberos delegation is actually quite straight forward and will only take perhaps 30 to 40 minutes to implement and verify provided the planning is correctly in place, much longer to troubleshoot.

I hope this blog helps someone, and feel free to ask some questions in the comments, and I’ll do my best to help out.

Share:
  • Print
  • Twitter
  • LinkedIn
  • Facebook
  • Google Bookmarks
  • del.icio.us
  • StumbleUpon
  • Digg

Aug
15

DATA VISUALISATION IN ALTIUS

WHAT DOES IT REALLY MEAN?

‘Data Visualisation’ has been one of the hottest phrases around in the past few years. However, by speaking to even a small cross section of people within the technology industry, let alone amongst our clients and the wider world, it is clear that it means different things to different people. Because of this, its definition is very unclear and has become a convenient – and confusing – ‘catch all’ phrase. Indeed, a quick search online will throw up the following descriptions within the first half a dozen results:

According to Friedman (2008) the main goal of data visualisation is to communicate information clearly and effectively through graphical means. It doesn’t mean that data visualisation needs to look boring to be functional or extremely sophisticated to look beautiful. To convey ideas effectively, both aesthetic form and functionality need to go hand in hand, providing insights into a rather sparse and complex data set by communicating its key-aspects in a more intuitive way.

Definition from WhatIs.com – Data visualisation is a general term used to describe any technology that lets corporate executives and other end users “see” data in order to help them better understand the information and put it in a business context. Visualisation tools go beyond the standard charts and graphs used in Excel spreadsheets, displaying data in more sophisticated ways…

From Wikipedia, the free encyclopaedia – Data visualisation is the study of the visual representation of data, meaning information which has been abstracted in some schematic form, including attributes or variables for the units of information.

My own short and sweet definition for it is:

Data visualisation is a visual communication of information and big data generated by different research and business intelligence processes in an understandable, clear and effective manner.

 

HAVE WE EMBRACED IT?

As an innovative company we are perfectly positioned to embrace data visualisation and use it to its fullest potential, but for some reason, the industry has been very slow to adopt. It can be demonstrated how effective good data visualisation can be with simple design re-touches and use of latest technology.

Our visual outputs through web portals, which so far have been simple and traditional charts, are passed around their individual sectors and the key stakeholders of our clients far more rapidly and widely than our traditional deliverables. Therefore as an industry we need to adapt and embrace new ways to deliver data.

We are all far more literate with interpreting data visualisation than a lot of research staff or clients imagine. Just a take a look at Sunday supplements, news bulletins, countless magazines and newspapers to see just how common these are now becoming. However, when creating data visualisations it is important to work together as a Business Intelligence Strategist and Designer to ensure that not only is content relevant, accurate and informative, but it is also presented in an engaging, accessible way.

 

THE GOOD, THE BAD AND THE UGLY VISUALISATIONS?

Effective data visualisation needs two components to make it really work – data and a story. Good visualisation really supports the story. A bad visualisation lives on its own, outside the narration and the context that it originated from. If the story is pre-written and the visualisation is sort of separate, it is not put to good use and takes up the ugliest form.

Good data visualisation can help users explore and understand the patterns and trends in data, and also communicate that understanding to others to help them make robust decisions based on the data being presented. If we manage to make a guided interactive experience for users through our visualisations, we will be able to tell very interesting and very engaging stories, while empowering businesses with the best Performance Management and Business Intelligence tools.

Share:
  • Print
  • Twitter
  • LinkedIn
  • Facebook
  • Google Bookmarks
  • del.icio.us
  • StumbleUpon
  • Digg

Jul
09

Microsoft announce PowerBI for Office 365 at Worldwide Partner Conference

Big news overnight in the Business Intelligence world from Microsoft’s Worldwide Partner Conference.  Microsoft announced the imminent release of PowerBI for Office 365 which now brings all of the desktop BI features for Excel to the Office 365 version.  Also two Excel plug-ins released earlier this year, Data Explorer and GeoView, have also been included in the release and have been re-branded as Power Query and Power Map (authors out there – notice the spacing in the product names) . The full product suite is as follows:

  • Power Query, enabling customers to easily search and access public data and their organization’s data, all within Excel (formerly known as “Data Explorer“).
  • Power Map, a 3D data visualization tool for mapping, exploring and interacting with geographic and temporal data (formerly known as product codename “Geoflow“).
  • Power Pivot for creating and customizing flexible data models within Excel.
  • Power View for creating interactive charts, graphs and other visual representations of data.

Data can also be stored and accessed from either on-premise or cloud based hosting, no doubt SharePoint online will form a key part of this.  PowerBI also includes a natural language query ability and it will be interesting to see how good this (I’ve registered for the preview, so I’ll tell you more about this when I get it).

Mobile
The really big part of all this, however, is the fact that PowerBI also supports mobile devices, including iPads.  About time too.  A lot of us have been saying for while now that the new plugins, especially Power View, would port well to mobile devices.  Well these now have added HTML5 support which enables them to work on any device.  No mention of what’s happened to Silverlight, but I suspect it’s still there.  Looks like the distribution of data and reports in the new mobile/cloud world just got a whole lot easier.

Competitors
It will be interesting to see what the other vendors do now to top this.  It could be argued that in one big jump Microsoft has just caught up with a lot of them.  Oracle’s OBIEE+ product, for example, has supported mobile devices for some time now, and vendors such as QlikView and Tableau have had better visualisation capabilities.  So in that respect it could be seen as just an ever maturing product catching up.  But, there are two aspects which I think are very compelling.  Firstly the Data Explorer plugin – now Power Query – this is a fantastic piece of kit which on it’s own can dramatically expand the range data available to analysts.  No doubt, the new version is more refined, but by being able to connect to online data sources including social media feeds (e.g. Twitter) and Hadoop based technologies, it really opens up the Big Data world for organisations.  The second aspect is that this now all comes “with” Excel.  It’s all integrated. No additional third party plugins or licences are needed.  From a support perspective that’s very compelling.

For now this has been announced as only available in Office 365, which goes along with Microsoft’s Cloud first strategy, but I don’t think it will be long before we see this as available for the desktop version.  You could argue most of it is there already.

You can read more about the announcement here:

http://blogs.office.com/b/office-news/archive/2013/07/08/announcing-power-bi-for-office-365.aspx

and some industry views here:

http://www.jenstirrup.com/2013/07/power-business-intelligence-for.html
http://cwebbbi.wordpress.com/2013/07/08/some-thoughts-about-power-bi/

 

Share:
  • Print
  • Twitter
  • LinkedIn
  • Facebook
  • Google Bookmarks
  • del.icio.us
  • StumbleUpon
  • Digg

Jun
03

States of Jersey Hansard Results Update

Following a brief conversation with Senator Ozouf, who kindly took the time to read my last post, I took another look at the name matching as he felt the figures the Bailiff were high. I succeeded in eliminating the known (and a couple of previously unknown) false-negative matches and have re-run the analysis. There are a couple of minor differences, but the overall picture looks the same.

There may well be further improvements that could be made. I encourage you to connect to the data model yourself and look around. If you think you find something that looks wrong you can double-click on the value in the pivot table and Excel will load the data rows that make up the value: in there you will find the transcripts the values represent. If you find a problem, please let me know!

Here are updated screenshots:

All time top speakers – Senator Le Claire pushes Deputy Tadier down into fourth place:

Top speakers since the last elections – no change in positions:

Speakers by Position-  some movements here; you can see significant downward movement as the false matches are stripped out:

(Note that the changes in this table are not expected to sum to zero as they are averages, not absolute values.)

 

Share:
  • Print
  • Twitter
  • LinkedIn
  • Facebook
  • Google Bookmarks
  • del.icio.us
  • StumbleUpon
  • Digg

May
31

Big Data – A Small Example: States of Jersey Hansard HDInsight Analysis Results

This post forms part of our “Big Data – A Small Example” series and describes the results and provides an evaluation of the technologies involved. (EDIT: Please see this post for updated results and screenshots.)

The story so far…

To get here we first screen-scraped the States of Jersey Assembly website to collect Hansard transcripts and Propositions, loaded them into HDInsight to analyse the transcripts for the contributions of each States Member on a Windows Azure virtual machine, then cleaned the results with Data Quality Services and loaded them  into a Sql Server 2012 data warehouse with an Analysis Services Tabular model. In this post I will show you some of the results – and give you access to the data model yourself, so that you can do your own analysis! The process looked like this:

This post will be split into two sections, one about the data the other about the technologies (just in case you’re not interested in that bit.)

What can the data tell us?

Connecting to the data model in Excel gives us a lot of data, but it’s not very meaningful. In the following screen shots I’ve pivoted the ‘Approx. Words or Qs’ measure, with the States Members on rows and years on columns, and filtered the Transcript Type to ‘Debate’. I’ve colour-coded the values (red is high, green is low) to give you visual cue:

 

The analysis tells us how much (roughly) has been said, by whom. On it’s own this tells us nothing, because it can’t tell us whether the Member in question is making important points or reciting poetry. As ever in life, it’s important to ask the right questions. Taking the pivot table above (sans colouring), ordering by total words spoken and adding some calculations to determine percentage of total words and a running total gives us this:

 

Immediately we spot something interesting. Over a quarter of all words spoken since the transcripts began can be attributed to just six people, and a fifth to the top four! But these names include two who are no longer States Members, and one who was elected to the States several years after the transcripts began – Deputy Tadier storming into the third place slot. So maybe this isn’t a fair comparison. Let’s narrow it down to just debates since the last election (or, more accurately, since those elected at the last election were sworn in):

 

Now the skew is even more pronounced with just four States Members clocking up 28% of the airtime. But this shouldn’t really surprise us. Deputies Tadier and Southern are Scrutiny Panel Chairmen, Senator Gorst is the Chief Minister and Senator Ozouf the Treasury and Resources Minister. These are positions of responsibility – so lets slice the numbers by position instead.

In this pivot table we are using the “Per Position” measure which calculates the number of words spoken per position holder. This enables us to do like-for-like comparisons of, say, the Chief Minister (a position held by one person, so is effectively all the words of they’ve said) to a Scrutiny Panel Member (of whom there are several, so their words have to be divided by the number of them.) This is what we find:

 

This makes sense: Chief Ministers and Ministers say the most followed by Scrutiny Chairmen and Panel members: the people who you’d expect to be most involved in debates. What surprised me was just how much was said by the Bailiff. However, the Bailiff (and the Deputy Bailiff or Greffier when they are in the Chair) conduct the debates, calling Members to speak and ruling on points of order and so forth.

What about the subjects of debates? What gets Members on their feet and talking? Removing dates from columns and swapping our Positions for Propositions we get:

 

This first thing this highlights is that it was much harder to identify the subject than the speaker. In over a quarter of debates the subject was not determined. More on why in a moment. However, we can get a good sense of the controversial stuff: Budgets and Plans.

Notes on Accuracy

Debates

Analysing free-text documents like Hansard transcripts is not easy as there is little or no predictable structure. Identifying speakers and subjects is an exercise in pattern matching. On the whole, all the paragraphs are attributed to someone (the exceptions being vote results and appendices which are stripped out) and the ‘Unknown’ values appearing in some pivot tables are the result of false-positive matches. While it may well be feasible to eliminate these false-positives, this is subject to the law of diminishing returns and we could only take it so far.

Results could also be thrown out by misattributing paragraphs to States Members. Diagnosing and resolving these issues is, again, subject to diminishing returns for effort, and while it is reasonably accurate there are some known cases involving small numbers of words. In general in these cases it is most likely they have been misattributed to the Bailiff, as he usually speaks between each Member.

There is a low but real margin of error; therefore, the numbers attributed to any States Members should be regarded as indicative rather than absolute.

Subjects

Identifying the subject relied on the Chair reading the Projet [sic] number and this being matched to the list of screen-scraped Propositions. Where this doesn’t happen, we can’t determine the subject under debate. This was a common occurrence when public business is resumed after adjournments or on follow-on days.

Written Questions

Approximate counts of written questions are available in the data model, however, these have been subject to the least testing and suffer from several known problems. For example, in one case the transcript records dialog between The Bailiff and another Member and this results in the Bailiff being credited with two written questions – which is clearly false. Treat this measure with caution.

Would you like to query the data model yourself?

Why not? It’s sitting there in the cloud. So long as you have Excel you can connect to the cube and do your own analysis.

Disclaimer: CYA time! The data model is offered ‘as-is’ with no guarantees as to its accuracy, etc. What you do with it is on you.

  1. Download this ODC (Office Data Connection) file: Altius SoJ Hansard Tabular Model
  2. Open Excel.
  3. From the ‘Data’ tab on the ribbon click ‘Connections’.
  4. Click ‘Add’, then ‘Browse for more’.
  5. Navigate to where you saved the ODC file and select it, then click ‘Open’.
  6. Click ‘Close’.
  7. Click ‘Existing connections’ also on the Data tab of the ribbon.
  8. ‘Altius SoJ Hansard Tabular Model’ should appear under the ‘Connections in this Workbook’ section.
  9. Select it and click ‘Open’.
  10. Click ‘OK’ in the ‘Import Data’ dialog that appears.
  11. A blank pivot table will appear, with the available fields in a panel on the right.

If you’re wary of downloading files off the web (and you should be, well done!) and know what you’re doing, create a new ODC file and edit the connection string to this:

Provider=MSOLAP.5;Persist Security Info=True;Initial Catalog=HansardTabular;Data Source=altiussojhansard.cloudapp.net:60151;Impersonation Level=Anonymous;Location=altiussojhansard.cloudapp.net:60151;MDX Compatibility=1;Safety Options=2;MDX Missing Member Mode=Error

For more help on ODC files see here.

If you have questions, comments, feedback, have suggestions for calculations you’d like to see or find errors, please feel free to e-mail me: charles dot robertson at altius dot je, or tweet me @charles_jsy.

How could we expand the analysis?

For simplicity we simply counted the number of words. However, there is so much more that could be done:

  • Sentiment analysis: who is being positive, and who is being negative?
  • Semantic analysis: what is being spoken about?
  • Word clouds: what words do Members like to use?
  • Graph analysis: who are regular interlocutors?
  • Whose propositions most often get passed or rejected?

With a little more effort you could even compare manifesto promises to what actually happens…

 Technical Summary

If you’re not a techie, look away now!

HDInsight

You need to be comfortable using the command line and getting your hands dirty, because there is no WYSIWYG UI and no wizards. That said, it is an incredibly powerful tool.

So what is for? Well, it’s not a magic wand. It doesn’t magically do analysis for you. It’s a framework within which to do analysis. The primary advantage of the Hadoop side is the massive scaling potential – just add nodes and your computation capacity goes up. However, your data does need to be in an appropriate format. If you’re dealing with a single file of well-ordered data you’d be better off scaling a traditional ETL architecture. HDInsight comes into its own when you have large and/or varied data sets and especially if you need to do more flexible, programmatic analysis as we did in this example.

Having Hive on top is also powerful. Hive would be especially useful in one-off scenarios where an analyst is manipulating the data rather than it being part of an automated process – and its string-querying ability makes T-Sql look very poor.

Azure

Windows Azure was easy to use and was a delight for scaling resources as I needed them. For prototyping and distributed development efforts it is definitely worth consideration.

SSAS Tabular

Much quicker to develop in than Multidimensional models, but doesn’t always behave as you might expect coming from an MDX background. DAX is powerful, but you need to take the time to understand how it works.

Conclusion

We’ve seen in this series of blogs techniques and tools which enable us to perform advanced analysis on sources of data previously out of reach. HDInsight is a great tool, but it only replaces a BI developer’s tools under a few circumstances (mainly to do with speed of analysis). What it does do is open up whole new fields of potential analysis. Its flexibility and scalability means you can tackle problems which, because of size, variability or lack of structure, simply weren’t possible before. Finally, as I have shown above, even with new and more powerful analytical tools, context is still king and success depends on knowing what questions you want answers to.

Share:
  • Print
  • Twitter
  • LinkedIn
  • Facebook
  • Google Bookmarks
  • del.icio.us
  • StumbleUpon
  • Digg

Older posts «