The world is datacentric and everything moves based on data. SQL Server is a database management system and takes care of every aspect of data, from second to second management by online transaction processing (OLTP) to mining stored, archived data that includes data that streams (OLAP).
Data and Data Science:
Raw data can be enormous and the only way to give meaning and take out value out is the role of analytics. The advanced analytics handlers are the data science professionals. They go through the mountain of data and distil out the most useful, and the most relevant information. The success of a company, or organization is dependent on these professionals. All of these example following use cases and many more not mentioned here may be cited: credit risk assessments; managing customer/employee churn; hospitalization metrices; targeted sales & campaigns, etc.
Data Science using SQL Server:
SQL Server is a very comprehensive, database management system well suited for everything related to data present anywhere from OLTP data to OLAP data. It can be present both on site and in the cloud and integrated with the AZURE cloud which in turn is integrated with a myriad of other data related applications not only from Microsoft but from many other applications. Doing data science with SQL Server results in high value and high returns.
Data scientists can connect to a myriad of databases that can be used to train data and test the machine learning tool at their hands.
Where does AI come in or fit in?
The data scientists access the data from the client's database and combine with data from other sources to develop models using R software for joining the data and filtering based on criteria. They may begin shaping the data by creating extra informational features like new columns, or useful data partitioned or transformed, etc. This data shaping lays the foundation for predictive analytics. Going a step further they can put into operation a plan for the model that applications can use for producing useful outputs in the form of a predictive model.
This just one scenario where data is taken out of database and worked upon to get at the useful information by creating models. How easy is it? Moving large amounts of data in and out of database comes at a cost. Thinking about cost and other contingent aspects like the location of data, the security of data (that was not mentioned so far), the latency involved if geographically separated sites are involved, and not having features of DBMS like indexing, column stores, high availability, etc. one can conclude that it may be prudent to do as much of the filtering and shaping done on the database using all the tools the DBMS can provide instead of working on raw data from a database.
As a result of considerations discussed previously it will be indeed beneficial if data science and AI on the existing database can be carried out prior to moving out data. This allows you to leverage all the inbuilt features of the DBMS previously discussed. If the data involves geographical data these can be handled inside SQL Server's inbuilt data types. If on the other hand data has to be accessed from other data sources outside of Microsoft, the linked source feature of the DBMS can be used.
The following picture copied from a Microsoft site shows the two ways discussed so far.
Another important consideration is after doing predictive analytics the operationalization of what has been achieved with data science, namely the predictive model. The deployment of this to a production environment can be accomplished by using programming using SQL Server R services in the form of a stored procedure. The predictive model will be stored as varbinary (max) in a database table.
Look forward to more discussions related to SQL Server and AI.