cassandra

Recursos de programación de cassandra
Once you start working with Big Data systems, you discover a whole bunch of problems you won’t find in monolithic systems. Monitoring all of the components becomes a big data problem itself. In the talk we’ll mention all of the aspects that you should take in consideration when monitoring a distributed system using tools like: Web Services,Spark,Cassandra,MongoDB,AWS. Not only the tools, what should you monitor about the actual data that flows in the system? We’ll cover the simplest solution with your day to day open source tools, the surprising thing, that it comes not from an Ops Guy.
For many use cases such as fraud detection or reacting on sensor data the response times of traditional batch processing are simply to slow. In order to be able to react to such events close to real-time, we need to go beyond the classical batch processing and utilize stream processing systems such as Apache Spark Streaming, Apache Flink, or Apache Storm. But these systems are not sufficient by itself. One common example for such fast data pipelines is the SMACK stack using Apache Spark, Mesos, Kafka, Akka, Cassandra, Kafka
ScyllaDB is a NoSQL database compatible with Apache Cassandra, distinguishing itself by supporting millions of operations per second, per node, with predictably low latency, on similar hardware. Achieving such speed requires a great deal of diligent, deliberate mechanical sympathy: ScyllaDB employs a totally asynchronous, share-nothing programming model, relies on its own memory allocators, and meticulously schedules all its IO requests. In this talk we will go over the low-level details of all the techniques involved - from a log-structured memory allocator to an advanced cache design -, covering how they are implemented and how they fully utilize the hardware resources they target.
The CAP theorem points to unavoidable tradeoffs between consistency and availability when the network can partition. This decision heavily impacts system performance and cost. Current database design forces application developers to decide early in the design cycle, and once and for all, where they sit in this spectrum. At one extreme, strong consistency, as in Spanner or CockroachDB, requires frequent global coordination; restricting concurrency in this way greatly simplifies application development, but it reduces availability and increases latency. At the opposite extreme, systems such as Riak or Cassandra provide eventual consistency only: they never sacrifice availability, but application developers must write code to deal with all sorts of concurrency anomalies in order to prevent violation of application invariants. However, a system only needs to be consistent enough for the application to remain correct. We propose a unique middle ground, Just-Right Consistency (JRC), composed of various techniques that do not sacrifice availability, unless provably required for the application to execute correctly. We overview JRC, and present an open-source cloud-scale database built for it, Antidote. Antidote stores Conflict-Free Replicated Data Types (CRDTs) under Transactional Causal Consistency (TCC), the strongest model that does not compromise availability. Optionally, a transaction can be ACID, but Antidote keeps availability high by moving the required coordination outside the common path. Finally, we leverage research tools that help developers use ACID properties selectively, only when necessary for correctness.
Once you start working with Big Data systems, you discover a whole bunch of problems you won’t find in monolithic systems. Monitoring all of the components becomes a big data problem itself. In the talk, we’ll mention all of the aspects that you should take in consideration when monitoring a distributed system using tools like Web Services, Spark, Cassandra, MongoDB, AWS. Not only the tools, what should you monitor about the actual data that flows in the system? We’ll cover the simplest solution with your day to day open source tools, the surprising thing, that it comes not from an Ops Guy.
In hardly any other area we've had as much change and improvement as in the field of databases. Just a few years ago everyone had "their" database that was used for each project. Today you are confronted with a variety of approaches and implementations. We start off with a brief look at the theoretical background of distributed systems and databases in particular. On the basis of this, we take a look at traditional relational databases such as PostgreSQL and MySQL. Additionally, we dive into newer NoSQL systems like MongoDB, Redis, Cassandra, or Elasticsearch. After that, we discuss possible scenarios as well as the advantages and disadvantages of several databases: * Why SQL is in fashion (again). * Why MongoDB's document structure fits object-oriented programming so well. * How you can capture visitor hits with Redis efficiently. * Why Cassandra is so scalable and fail-safe. * How full-text search works with Elasticsearch. The right choice of database(s) hasn't become easier through the wide range of possibilities, but all the more interesting!
Apache Cassandra is a scalable database with high availability features. But they come with severe limitations in term of querying capabilities. Since the introduction of SASI in Cassandra 3.4, the limitations belong to the pass. Now you can create performant indices on your columns as well as benefit from **full text search** capabilities with the introduction of the new `LIKE %term%` syntax. To illustrate how SASI works, we'll use a database of 100 000 albums and artists. We'll also show how SASI can help to accelerate analytics scenarios with Spark using SparkSQL predicate-pushdown
IOT has a massive potential and its impact in our daily life is important. Here is an exemple of using a connected object and analysing its data: we'll see how to collect data using the accelerometer sensor of your smartphone. Then, we'll store it in Cassandra as a Timeseries model. Finally, we'll analyse those data and predict the activity with Spark. We will see a live demo on stage to show this solution working in realtime.
Akka es un modelo de programación reactivo altamente concurrente basado en actores, diseñado para facilitar la creación de sistemas distribuidos. En Java disfruta de mucha fama, y ahora con el port de Akka a la plataforma .NET tenemos disponible toda su potencia en C# y F#. En esta charla daremos un (breve) repaso a qué es la programación con actores, qué aporta Akka.Net (que ya se graduó con su versión 1.0) y también veremos algunos ejemplos prácticos, incluyendo módulos que han sido recientemente portados a .Net (persistencia, por ejemplo, con bases de datos como MongoDB o Cassandra). También veremos lo bien que se integra con web API y signalR en una aplicación web con angularjs. El ecosistema .NET va cambiando! http://2015.codemotion.es/agenda.html #5677904553836544/43864003
Watch the talk at Big Data Spain: http://www.bigdataspain.org/program/thu/slot-9.html Andrés works for Stratio as Big Data Software Architect, involved in the development of its Big Data platform. He is the main designer and developer of Stratio's Cassandra Lucene Index, a plugin that uses Lucene for extending C* index functionality to provide near real time search such as ElasticSearch or Solr and to speed up Spark jobs. http://www.bigdataspain.org Big Data Spain 2015 Conference 15th-16th Oct 2015 Kinépolis Madrid Event promoted by: http://www.paradigmadigital.com