Hey! Those of my readers who speak Russian will find the post interesting. The podcast “I’m trying to understand” is something I recently started. Both Apple Podcasts and Yandex Music have it available.
Continue reading...Lightbend Cloudflow is now deprecated
Today I accidentally found out that one of the most important open source projects that literally boosted my career is now deprecated. This is about Lightbend Cloudflow. Cloudflow enables users to quickly develop, orchestrate, and operate distributed streaming applications on Kubernetes. Cloudflow allows you to easily break down your streaming...
Continue reading...MLOps Maturity Levels according to Google
Hey! This article is also available on YouTube: What is MLOps? MLOps is a methodology of operation that aims to facilitate the process of bringing an experimental Machine Learning model into production and maintaining it efficiently. MLOps focus on bringing the methodology of DevOps used in the software industry to...
Continue reading...Why replication? A story of single-leader / master-slave / active-passive replication in distributed systems.
Introduction This article is also available on YouTube: Replication in distributed systems occurs when each piece of data has more than one copy and each copy is located on a separate node. There are a few reasons to adopt replication: To achieve the data redundancy and therefore allow a system...
Continue reading...Brief introduction to Alibaba’s Alink: A Flink-based ML Platform and Text Sentiment Analysis using Logistic Regression
What is Alink? Alink is the Machine Learning algorithm platform based on Flink, developed by the PAI team of Alibaba computing platform. Alink contains a variety of ready-to-use ML algorithms and can even compete with Python-based frameworks and libraries on the number of features. Some of the features are also...
Continue reading...Saving Spark Dataframe to Apache Cassandra using Datastax Spark Connector
Hi! Today we’re going to look at Datastax Spark Cassandra Connector. Topics that are covered in this video: Generating a test CSV dataset using Python; Creating a schema in Cassandra; Preparing Jupyter workbench; Reading CSV into a DataFrame; Writing the DataFrame to the Cassandra;
Continue reading...How, Where and When Apache Kafka Writes Data (Page Cache included)
I recently heard that Apache Kafka does not store data on disks. Yeah, I’m not kidding, my mate told me, that Kafka manages most of the operations in memory and doesn’t rely heavily on disks. That seemed odd to me, and today I will try to investigate Kafka’s persistence process...
Continue reading...Apache Cassandra Write Path, Compaction and Use Cases in 3 Minutes
Over the last years, Apache Cassandra became one of the most popular NoSQL solutions for big data. It started back in 2008 as an open-sourced product from Facebook, became an Apache Incubator project in 2009, and graduated to a top-level in 2010. It is three-minutes-tech series, I’m Alex Sergeenko and...
Continue reading...Message Delivery Semantics in Distributed Systems
Today we are going to be talking about message delivery semantics in Apache Kafka. Since interactions using Kafka are mostly asynchronous there are no direct connections between client and server. Moreover, there are no clients and servers of their usual meaning at all. Apache Kafka serves as an intermediate asynchronous...
Continue reading...Feature Store 101: Machine Learning
What is a feature store? Machine learning does not operate solely based on models. To make predictions you also need features. According to Wikipedia, a feature is an individual measurable property or characteristic of a phenomenon. But this definition may seem confusing and needs to be clarified. In simple terms,...
Continue reading...