Today I accidentally found out that one of the most important open source projects that literally boosted my career is now deprecated. This is about Lightbend Cloudflow. Cloudflow enables users to quickly develop, orchestrate, and operate distributed streaming applications on Kubernetes. Cloudflow allows you to easily break down your streaming...
Continue reading...Big Data
Brief introduction to Alibaba’s Alink: A Flink-based ML Platform and Text Sentiment Analysis using Logistic Regression
What is Alink? Alink is the Machine Learning algorithm platform based on Flink, developed by the PAI team of Alibaba computing platform. Alink contains a variety of ready-to-use ML algorithms and can even compete with Python-based frameworks and libraries on the number of features. Some of the features are also...
Continue reading...Saving Spark Dataframe to Apache Cassandra using Datastax Spark Connector
Hi! Today we’re going to look at Datastax Spark Cassandra Connector. Topics that are covered in this video: Generating a test CSV dataset using Python; Creating a schema in Cassandra; Preparing Jupyter workbench; Reading CSV into a DataFrame; Writing the DataFrame to the Cassandra;
Continue reading...Feature Store 101: Machine Learning
What is a feature store? Machine learning does not operate solely based on models. To make predictions you also need features. According to Wikipedia, a feature is an individual measurable property or characteristic of a phenomenon. But this definition may seem confusing and needs to be clarified. In simple terms,...
Continue reading...Window Operations in Stream Processing
This post was created using the information given in “Stream processing with Apache Flink”, the book written by Vasiliki Kalavri and Fabian Hueske. There are few types of window operations that are most commonly used in stream processing. They are: tumbling windowing, sliding windowing, and session windowing. Today we will...
Continue reading...