apache

Recursos de programación de apache
APIs are the glue that holds our information systems together. If you run more than a couple of apps, having each of them implement authentication, etc. is going to be an Ops nightmare. You definitely need a central point of management, an API Gateway. As developers, we live more and more in an interconnected world. Perhaps you’re developing microservices? Maybe you’re exposing your APIs on the web? In all cases, web APIs are the glue that binds our architecture together. In the Java world, we are very fortunate to have a lot of libraries to help us manage related concerns: rate limiting, authentication, service discovery; you name it. Yet, these concerns are cross-cutting. They impact all our applications in the same way. Perhaps libraries are not the optimal way to handle them. API Gateways are a popular and nowadays quite widespread way to move these concerns out of the applications to a central place. In this talk, I’ll describe in more detail some of these concerns and how you can benefit from an API Gateway. Then, I’ll list some of the available solutions on the market. Finally, I’ll demo APISIX, an Apache-managed project built on top of NGINX that offers quite a few features to help you ease your development.
En este tercer episodio en colaboración con Confluent hablaremos sobre qué es el Stream Processing y descubriremos que es mucho más que el procesamiento de millones de eventos por segundo. Puedes ver el resto de episodios de esta serie en los siguientes enlaces: https://www.ivoox.com/que-es-como-funciona-apache-kafka-audios-mp3_rf_81153210_1.html https://www.ivoox.com/descubriendo-kafka-confluent-primeros-pasos-audios-mp3_rf_75587433_1.html Ponentes: Sergio Durán Vegas, Responsable de Solutions Engineering en España y Portugal en Confluent. Jesús Pau de la Cruz, Arquitecto de Software en Paradigma Digital. ¿Quieres escuchar nuestros podcasts? https://www.ivoox.com/podcast-apasionados-tecnologia_sq_f11031082_1.html ¿Quieres ver otros tutoriales? https://www.youtube.com/c/ParadigmaDigital/playlists ¿Quieres saber cuáles son los próximos eventos que organizamos?: https://www.paradigmadigital.com/eventos/
En el episodio de hoy hablaremos sobre Apache Kafka. Veremos qué es, cómo funciona y las distintas distribuciones que podemos encontrar. Tambień veremos casos de uso en los que la tecnología Kafka nos puede ayudar y veremos qué beneficios aporta Confluent con respecto a otras distribuciones. Si te perdiste nuestro primer episodio sobre el ecosistema Confluent, o quieres volver a escucharlo, aqui tienes el link: https://www.ivoox.com/descubriendo-kafka-confluent-primeros-pasos-audios-mp3_rf_75587433_1.html Intervienen: Víctor Rodríguez, Arquitecto de Soluciones en Confluent. Jesús Pau, Arquitecto de Software en Paradigma Digital. Para no perderte ningún video tutorial, suscribete a nuestro canal y tendrás todas las novedades del mundo tecnológico, de transformación digital, eventos y mucho más. https://www.youtube.com/user/ParadigmaTe?sub_confirmation=1 ¿Quieres ver otros tutoriales? https://www.youtube.com/c/ParadigmaDigital/playlists ¿Quieres escuchar nuestros podcasts? https://www.ivoox.com/podcast-apasionados-tecnologia_sq_f11031082_1.html ¿Quieres saber cuáles son los próximos eventos que organizamos?: https://www.paradigmadigital.com/eventos/
Data comes at us fast” is what they say. In fact, the last couple of years taught us how to successfully cleanse, store, retrieve, process, and visualize large amounts of data in a batch or streaming way. Despite these advances, data sharing has been severely limited because sharing solutions were tied to a single vendor, did not work for live data, came with severe security issues, and did not scale to the bandwidth of modern cloud object stores. Conferences have been filled for many years with sessions about how to architect applications and master the APIs of your services, but recent events have shown a huge business demand for sharing massive amounts of live data in the most direct scalable way possible. One example is open data sets of genomic data shared publicly for the development of vaccines. Still, many commercial use cases share news, financial or geological data to a restricted audience where the data has to be secured. In this session, dive deep into an open source solution for sharing massive amounts of live data in a cheap, secure, and scalable way. Delta sharing is an open source project donated to the Linux Foundation. It uses an open REST protocol to secure the real-time exchange of large data sets, enabling secure data sharing across products for the first time. It leverages modern cloud object stores, such as S3, ADLS, or GCS, to reliably transfer large data sets. There are two parties involved: Data Providers and Recipients. The data provider decides what data to share and runs a sharing server. An open-sourced reference sharing service is available to get started for sharing Apache Parque or Delta.io tables. Any client supporting pandas, Apache Spark™, Rust, or Python, can connect to the sharing server. Clients always read the latest version of the data, and they can provide filters on the data (e.g., “country=ES”) to read a subset of the data. Since the data is presented as pandas or Spark dataframes the integration with ML frameworks such as MLflow or Sagemaker is seamless.
As individuals, we use time series data in everyday life all the time; If you’re trying to improve your health, you may track how many steps you take daily, and relate that to your body weight or size over time to understand how well you’re doing. This is clearly a small-scale example, but on the other end of the spectrum, large-scale time series use cases abound in our current technological landscape. Be it tracking the price of a stock or cryptocurrency that changes every millisecond, performance and health metrics of a video streaming application, sensors for reading temperature, pressure and humidity, or the information generated from millions of IoT devices. Modern digital applications require collecting, storing, and analyzing time series data at extreme scale, and with performance that a relational database simply cannot provide. We have all seen very creative solutions built to work around this problem, but as throughput needs increase, scaling them becomes a major challenge. To get the job done, developers end up landing, transforming, and moving data around repeatedly, using multiple components pipelined together. Looking at these solutions really feels like looking at Rube Goldberg machines. It’s staggering to see how complex architectures become in order to satisfy the needs of these workloads. Most importantly, all of this is something that needed to be built, managed, and maintained, and it still doesn’t meet very high scale and performance needs. Many time series applications can generate enormous volumes of data. One common example here is video streaming. The act of delivering high quality video content is a very complex process. Understanding load latency, video frame drops, and user activity is something that needs to happen at massive scale and in real time. This process alone can generate several GBs of data every second, while easily running hundreds of thousands, sometimes over a million, queries per hour. A relational database certainly isn’t the right choice here. Which is exactly why we built Timestream at AWS. Timestream started out by decoupling data ingestion, storage, and query such that each can scale independently. The design keeps each sub-system simple, making it easier to achieve unwavering reliability, while also eliminating scaling bottlenecks, and reducing the chances of correlated system failures which becomes more important as the system grows. At the same time, in order to manage overall growth, the system is cell based – rather than scale the system as a whole, we segment the system into multiple smaller copies of itself so that these cells can be tested at full scale, and a system problem in one cell can’t affect activity in any of the other cells. In this session, I will introduce the problem of time-series, I will take a look at some architectures that have been used it the past to work around the problem, and I will then introduce Amazon Timestream, a purpose-built database to process and analyze time-series data at scale. In this session I will describe the time-series problem, discuss the architecture of Amazon Timestream, and demo how it can be used to ingest and process time-series data at scale as a fully managed service. I will also demo how it can be easily integrated with open source tools like Apache Flink or Grafana.
CDC es un conjunto de patrones que nos permite detectar cambios en una fuente de datos y actuar sobre ellos. En este webinar vamos a ver una de las implementaciones reactivas de CDC basada en Debezium, que nos permitirá replicar los cambios producidos sobre un sistema Legacy basado en DB2 y Oracle a un bus de eventos Apache Kafka en tiempo real, con la finalidad de poder realizar una transformación digital del sistema actual. Repositorio: https://github.com/paradigmadigital/debezium ¿Quiénes son los ponentes? Jesús Pau de la Cruz. Soy Ingeniero Informático por la Universidad Rey Juan Carlos y me encanta la tecnología y las posibilidades que ofrece al mundo. Interesado en el diseño de soluciones Real-time, arquitecturas distribuidas y escalables y entornos Cloud. Actualmente trabajo en Paradigma como Arquitecto Software. José Alberto Ruiz Casarrubios. Ingeniero informático de vocación, todoterreno de la tecnología y aprendiz incansable. Estoy siempre buscando nuevos retos a los que intentar aportar la mejor solución. Inmerso de lleno en el mundo del desarrollo de software y modernización de sistemas. Creyente de que aplicar el sentido común es la mejor de las metodologías y decisiones.
Una de las arquitecturas que está creciendo en uso debido a la popularidad de los microservicios es Event-Driven Architecture (EDA). Haciendo uso de patrones como Event Sourcing y Event Collaboration, permite desacoplar los microservicios y facilita la operación de los mismos. Sin embargo, al igual que con la comunicación síncrona, debe haber acuerdos entre consumidores y productores para garantizar que no se rompa la compatibilidad. En esta charla, Antón compartirá su experiencia construyendo este tipo de arquitecturas y, en concreto, los problemas a los que se ha enfrentado a la hora de gobernar esos acuerdos en arquitecturas que se expanden a varios datacenters y diferentes nubes. Contará el camino recorrido para integrar Kafka, Azure EventHub o Google PubSub usando tecnologías como Kafka Connect y Google Dataflow. #Sobre el ponente (Antón R. Yuste) I’m a Principal Software Engineer focused on Event Streaming and Real-Time Processing. I’ve experience working with different message brokers and event streaming platforms (Apache Kafka, Apache Pulsar, Google Pub/Sub and Azure EventHub) and real-time processing frameworks (Flink, Kafka Streams, Spark Structured Streaming, Google Dataflow, Azure Stream Analytics, etc.). During my career, I specialized in building internal SaaS in big corporations to make complex technologies easily used and adopted by teams so they can build solutions to real business use cases. From the very beginning, I can help with governance, operation, performance, adoption, training and any task related to system administration or backend development.
This talk examines business perspectives about the Ray Project from RISELab, hailed as a successor to Apache Spark. Ray is a simple-to-use open source library in Python or Java, which provides multiple patterns for distributed systems: mix and match as needed for a given business use case – without tight coupling of applications with underlying frameworks. Warning: this talk may change the way your organization approaches AI. #BIGTH20 #RayProject Session presented at Big Things Conference 2020 by Paco Nathan, Managing Partner at Derwen 16th November 2020 Home Edition
Machine Learning (ML) is separated into model training and model inference. ML frameworks typically use a data lake like HDFS or S3 to process historical data and train analytic models. Model inference and monitoring at production scale in real time is another common challenge using a data lake. But it’s possible to completely avoid such a data store, using an event streaming architecture. This talk compares the modern approach to traditional batch and big data alternatives and explains benefits like the simplified architecture, the ability of reprocessing events in the same order for training different models, and the possibility to build a scalable, mission-critical ML architecture for real time predictions with muss less headaches and problems. The talk explains how this can be achieved leveraging Apache Kafka, Tiered Storage and TensorFlow. Session presented at Big Things Conference 2020 by KAI WAEHNER Field CTO, Confluent 18th November 2020 Home Edition Do you want to know more? https://www.bigthingsconference.com/
La inmensa mayoría del contenido que se crea diariamente en Internet es desestructurado. Aproximadamente el 90% del mismo es texto. En la era de la web colaborativa, usamos el lenguaje constantemente, por ejemplo, para escribir una crítica de un producto, comentar una foto o escribir un tweet. En esta charla veremos algunas de las herramientas que ofrece el ecosistema Python para comprender, estructurar y extraer valor de un texto y veremos cómo el enfoque a la hora de atacar tareas de procesamiento de texto ha ido evolucionando en los últimos años hasta la tendencia actual basada en Transfer Learning. Además, lo haremos a través de un caso de uso concreto: detectar comentarios ofensivos o insultos a otros usuarios en redes sociales o foros. Bio: Rafa Haro trabaja actualmente como Search Architect en Copyright Clearance Center. Durante sus más de 14 años de experiencia en el desarrollo de software, ha trabajado principalmente en empresas relacionadas con el Procesamiento de Lenguaje Natural, Tecnologías Semánticas y Búsqueda Inteligente. Participa activamente además con diversas comunidades Open Source como Apache Software Foundation dónde es committer y PMC member de dos proyectos: Apache Stanbol y Apache Manifold.