Data Science - CompleteStream March 2017

We recently presented at CompleteStream 2017 in Melbourne, it was a good turnout with some great presentations by knowledgeable practitioners.

We began with a brief example that emphasises one of the less talked about benefits of machine learning, which is pertinent in SAP environments; that is its ability to replace hard-coded decision logic. In a sense, a machine learning model contains the logic that is determined by feeding it data. Following this, we go straight into a discussion of what machine learning is, using a linear model as an example and basis for further explanation. I often find this topic poorly explained, everyday there seems to be a new high level blog or product launch, with very few publications dealing with the technical side. I tried to get across that optimisation – especially gradient descent – often underpins these techniques. I briefly described this, however, for a good technical discussion on machine learning, see here and here.

We looked at a typical day for a ‘data scientist’ and included a screenshot from a ‘typical’ desktop which is very different to how it is often promoted!

Following on from this, we categorised (as much as one can) the data science process, and pointed out that much of it was associated with handling data; retrieving, understanding, cleaning, transforming, etc. In typical corporate environments, a relatively small amount of time is actually spent on model building.

Once a model is built, there are many knobs and dials to turn in order to improve the model. We discussed in the introductory example the value of tuning a model, and we list some of the many options a predictive modeller has at their disposal. However, we make the point, later on, that killer features are what contribute most to the accuracy of a model. (did I really mention that?)

We looked at SAP’s suite of products for data science, these included:

HANA R
HANA PAL
Predictive Analytics
- Expert Analytics
- Automated Analytics

We even undertook a benchmark on two different implementations of the random forrest algorithm; one from HANA’s predictive analytics library, with the equivalent in the popular R language.

Finally, we got to the business end of the topic, with a discussion around data science proof-of-concepts and where it sits regarding implementation.

Our concluding remarks were the following;

Machine learning isn’t a magic bullet, it’s one of many tools employed by data scientists in their work.
Much of this work is preparing data.
There exists many tools for carrying out data scientists, it’s important to know what they are and where they fit in your corporate environment.
Develop on anything, implement in HANA.
Regardless of the modelling engine, HANA will be needed for the initial data structures to be built

This is a short summary of our presentation to give you a flavour of what we presented. For the entire presentation, please follow this link to Data Science in SAP – Tell him he’s dreaming.

Michael Plazzer

Data Science – CompleteStream March 2017

We believe the best results come when data enables people.

Contact Ignite

Stay up to Date