Introduction This article is also available on YouTube: Replication in distributed systems occurs when each piece of data has more than one copy and each copy is located on a separate node. There are a few reasons to adopt replication: To achieve the data redundancy and therefore allow a system...
Continue reading...Java
Brief introduction to Alibaba’s Alink: A Flink-based ML Platform and Text Sentiment Analysis using Logistic Regression
What is Alink? Alink is the Machine Learning algorithm platform based on Flink, developed by the PAI team of Alibaba computing platform. Alink contains a variety of ready-to-use ML algorithms and can even compete with Python-based frameworks and libraries on the number of features. Some of the features are also...
Continue reading...Saving Spark Dataframe to Apache Cassandra using Datastax Spark Connector
Hi! Today we’re going to look at Datastax Spark Cassandra Connector. Topics that are covered in this video: Generating a test CSV dataset using Python; Creating a schema in Cassandra; Preparing Jupyter workbench; Reading CSV into a DataFrame; Writing the DataFrame to the Cassandra;
Continue reading...How, Where and When Apache Kafka Writes Data (Page Cache included)
I recently heard that Apache Kafka does not store data on disks. Yeah, I’m not kidding, my mate told me, that Kafka manages most of the operations in memory and doesn’t rely heavily on disks. That seemed odd to me, and today I will try to investigate Kafka’s persistence process...
Continue reading...Message Delivery Semantics in Distributed Systems
Today we are going to be talking about message delivery semantics in Apache Kafka. Since interactions using Kafka are mostly asynchronous there are no direct connections between client and server. Moreover, there are no clients and servers of their usual meaning at all. Apache Kafka serves as an intermediate asynchronous...
Continue reading...Increasing reliability/durability of Apache Kafka
What comes to your mind when you hear the term “Distributed Messaging”? Well, not even that, let’s paraphrase: What are your expectations concerning the messaging in distributed systems? I suppose that “reliability” will occupy one of the top positions in the list of answers. What sense do we put into...
Continue reading...Beware of Message Ordering In Apache Kafka Topic Partitions
Hi! I’ve started a new series on my YouTube channel named “Alex Tried to Understand”. It will be short tech-related clips dedicated to some technical aspects. The first episode is dedicated to the ordering guarantees in Apache Kafka. Synopsis The official docs state that Kafka preserves message ordering inside a...
Continue reading...Add healthchecks for Apache Kafka in docker-compose
Let’s say we have an application inside docker image that connects to Apache Kafka on start and tries to publish a bunch of messages to topics. Now let’s imagine that you want to test your app locally and thus you’ve created a docker-compose.yml with Apache Kafka, Zookeeper, and your application...
Continue reading...Что такое Dozer – это mapper
Иногда в проектах требуется скопировать атрибуты одного объекта в другой, временами не напрямую, а с некоторыми преобразованиями, например извлечением подстроки или наоборот, объединением нескольких полей родительского объекта в единое поле дочернего. Такие преобразования называются маппингами (mappings). Чаще всего они применяются там, где необходимо передать Java-объект без избыточных чувствительных данных. Об...
Continue reading...Static Compilation (AOT) vs. Dynamic Compilation (JIT)
When it comes to compilation (not only in Java) one can name at least two types of the latter. Let’s check out the differences. Let’s say you are developing an application that consists of only one class. After the job is finished you definitely want to run the code. What...
Continue reading...