David DeWitt and Rimma Nehme delivered the day 2 keynote at PASS Summit 2015 talking about Internet of Things. Many PASS attendees know David for providing some of the best keynotes every, technical and without vaporware and marketing stuffs. So this is a good time to discuss about the real stat of Internet of Things (IoT) these days.
It’s easy to talk about IoT, but it’s hard to enter into this complex and heterogeneous world. There are many type of devices, and a simple categorization is consumer vs. industrial, which have very different requirements and features (not to mention cost, power, and standards used). The key to connect IoT devices is the cloud, but communication happens in two directions: from device to cloud, and from cloud to device. There requires different technologies and often different devices, because many of them have only a single role (sending data or receiving commands, sensors or actuators, but certain devices might do both).
As you can imagine, this is a real source of Big Data. I often see Big Data used to manage data that are generated into a structured relational database, which seems to be a non-sense. But with the volume of data generated by IoT this technology makes perfect sense. Of course, Azure has a lot of technologies that helps you manage this amount of data, but since the topic of this blog is Business Intelligence, I’m more interested to what happens when you want to analyze data.
Here, a few technologies that have a certain history (if you know former names) came into play. Data Mining (also known as Machine Learning these days) can be fundamental to make predictions based on previous behaviors. David is great in providing simple examples to explain the concept: a boiler has a pressure sensor and you have to open valve before boiler explodes. You can train the algorithm for predicting boiler failure, or you can provide built-in intelligence in the algorithm, with a predefined limit of bar to open valve. The analysis of this data requires real-time stream analysis, relying on the cloud for this real-time analysis would generate too much traffic and would also have higher latency (dangerous for this type of applications).
Here it comes a new “definition”: fog computing, also known edge computing. The idea is to not move the data to computation, but to move computation to data. However, the IoT is a database problem, which is not managed in this way these days. And this is the main point of this keynote. Proposing a Polybase for IoT that includes:
- Declarative language: today IoT is based on imperative languages, whereas the goal is to introduce declarative language, such as IoT-SQL (imagine to add a WINDOW and ACTION suffix to a classical SQL query, so that it can act on a range of time, triggering an action when certain conditions happens)
- Complex object modeling: define a standard structure to identify IoT locations in a hierarchical structure (imagine an object model with an API to navigate hierarchies of objects, traversing path and similar stuffs – similar to many MDX statements we know well)
- Scalable metadata management: simple abstraction (Metadata, statistics, access privileges) unified to access different devices; metadata includes collection of standard and extended attributes
- Discrete & Continuous Queries: different query types, such as ExecuteOnce (like a standard SQL), ExecuteForever (continuous flow of responses from device), ExecuteAction (such as ExecuteForever plus an action to execute in defined conditions)
- Multi-purpose queries: here is the smart idea. With a definition of the process at an higher level, the decision of moving the work in the cloud or at the edge (in the fog) is made by the query optimizer, creating a real query plan that distributes the actual work to different parts of the system depending on the requests
This approach is really ambitious, but an important part of it is the idea of embedding security in the system. If you think about the future of IoT, security is of paramount importance. I don’t know if this will be the future of IoT, but this speech raised points that have to be faced, sooner better than later.
You will be able to watch this keynote soon at PASStv on demand.
Retrieves a range of rows within the specified partition, sorted by the specified order or on the axis specified.
WINDOW ( <From> [, <FromType>], <To> [, <ToType>] [, <Relation>] [, <OrderBy>] [, <Blanks>] [, <PartitionBy>] [, <MatchBy>] [, <Reset>] )