Recursos de programación de apache
Apache Airflow is a workflow automation and scheduling system that can be used to author and manage data pipelines. Workflows are defined programmatically as directed acyclic graphs (DAG) of tasks, written in Python. At Idealista we use it on a daily basis for data ingestion pipelines. We’ll do a thorough review about managing dependencies, handling retries, alerting, etc. and all the drawbacks.
¿Y si lo escuchas mientras vas al trabajo o te pones en forma?: https://www.ivoox.com/46590288 ------------- En esta charla intentaré traducir a humano los complejos textos legales de las licencias de software. Analizaré el contenido de las licencias más utilizadas como la GPL v3, Mozilla, Apache, Mit, etc., tratando de aclarar las diferencias existentes entre cada una de ellas y sobretodo explicar a que te obligan, qué pasos tienes que seguir para respetarlas y las consecuencias de no hacerlo. ------------- Todos los vídeos de Commitconf 2019 en: https://lk.autentia.com/Commit19-YouTube ¡Conoce Autentia! Twitter: https://goo.gl/MU5pUQ Instagram: https://lk.autentia.com/instagram LinkedIn: https://goo.gl/2On7Fj/ Facebook: https://goo.gl/o8HrWX
The coming decade promises to be extremely exciting for astronomers and data/computer scientists alike with the coming of Large Synoptic Survey Telescope (LSST), James Webb Space Telescope, and others. These projects will produce a huge amounts of data that need to be searched, corellated, analyzed and learned from in order to find answers to the questions, such as “What are Dark Energy and Dark Matter?”, “How did our Universe form?”, “How many Earth-threatening asteroids are out there?” LSST with its unique architecture will go both “wide” and “deep”, meaning that it will acquire images of large parts of the sky capturing the most distant galaxies. It will continually scan the visible sky during the period of 10 years and will produce the first video of the Universe in history. These new and exciting times require new tools that will help astronomers perform these analytical tasks more efficiently. In collaboration with astronomers from the University of Washington I built AXS, Astronomy Extensions for Spark, a tool based on Apache Spark, designed for fast cross-matching of astronomical catalogs and easy astronomical data processing. In this talk I will go through details of AXS’ architecture and explain why it is so fast. #BIGTH19 #Analytics #MachineLearning #Spark Session presented at Big Things Conference 2019 by Petar Zečević, CTO at SV Group. 20th November 2019 Kinépolis, Madrid Do you want to know more? https://www.bigthingsconference.com/
In this talk, Theofilos Kakantousis present TFX on Hopsworks, a fully open-source platform for running TFX pipelines on any cloud or on-premise. Hopsworks is a project-based multi-tenant platform for both data parallel programming and horizontally scalable machine learning pipelines. Hopsworks supports Apache Flink as a runner for Beam jobs and TFX pipelines are supported through Airflow support in Hopsworks. We will demonstrate how to build a ML pipeline with TFX, Beam’s Python API and the Flink Runner by using Jupyter notebooks, explain how security is transparently enabled with short-lived TLS certificates, and go through all the pipeline steps, from Data Validation, to Transformation, Model training with TensorFlow, Model Analysis, Model Serving and Monitoring with Kubernetes. #BIGTH19 #BigData #DeepLearning Session presented at Big Things Conference 2019 by Theofilos Kakantousis, Data Engineer & COO at Logical Clocks. 21st November 2019 Kinépolis, Madrid Do you want to know more? https://www.bigthingsconference.com/
According to Wikipedia, an Event-driven Architecture, is a software architecture pattern that promotes the production, detection, consumption of, and reaction to events. There is a perfect pairing between microservice-based architectures, Domain Driven Design (DDD) and event-driven architectures. In this conference we will review what design principles are the catalyst for this symbiosis as well as practical examples in different areas including governance. Many business use cases can be articulated on top of these principles, abstracting them from both complexity and variability in the technological stack. As a good part of the audience will already be dealing with events and microservices, we will also explain other key concepts: - Designing a future-proof event taxonomy. - Strategies for event enrichment, starting with the definition of that concept. - Managing correlation or inference of events. - Benefits from an event schema registry using for example Apache Avro. - Traceability of events by design. - Data conciliation patterns, and when to avoid it. We will also take advantage of the opportunity to discuss about common challenges (and others not that common), frequent mistakes and how to avoid or mitigate them. To conclude, we will explain some use cases that we are solving superbly based on real-time events: Communications, Order Management, Business Activity Monitoring (BAM), KYC, GDPR... ------------- Todos los vídeos de Commitconf 2019 en: https://lk.autentia.com/Commit19-YouTube ¡Conoce Autentia! Twitter: https://goo.gl/MU5pUQ Instagram: https://lk.autentia.com/instagram LinkedIn: https://goo.gl/2On7Fj/ Facebook: https://goo.gl/o8HrWX
In this talk we discovered how to simplify data analysis over the cloud with Apache Kylin. Apache Kylin delivers game-changing extreme augmented OLAP technology for analyzing data instantly at petabyte scale that has been adopted by thousands organizations worldwide. Founded by the creators of Apache Kylin, Kyligence is on a mission to accelerate the productivity of its customers by automating data management, discovery, interaction, and insight generation – all without barriers. Kyligence provides an AI augmented data platform, powered by Apache Kylin, for analysts and data engineers to build and manage their data services from on-premises to multi-cloud. Session presented at Big Things Conference 2019 by Luke Han, Co-founder and CEO of Kyligence 20th November 2019 Kinépolis, Madrid
Delta Lake is an open source storage layer that brings reliability to data lakes. Delta Lake offers ACID transactions, scalable metadata handling, and unifies streaming and batch data processing. It runs on top of your existing data lake and is fully compatible with Apache Spark APIs. Session presented at Big Things Conference 2019 by Michael Armbrust, Principal Engineer at Databricks 20th November 2019 Kinépolis, Madrid
As our codebase grows, so its complexity does. Code it’s becoming harder to read, test, debug, maintain… such a mess! Let’s go back to the cool ’80s and start a journey to discover a different way to approach software development: Functional Programming. We’ll see how FP can actually save hours of debugging and improve our productivity while writing a complex frontend JavaScript application or a huge backend distributed system in any programming language. You’ll never write an impure function again! About: Michele Riva, Sr. Software Engineer - Openmind Michele discovered his passion for software development building an app to make funny jokes about his friends... and professors. Today, "fun" and "development" are still part of his life while working as a Software Engineer at openmind and contributing to some of the biggest OpenSource projects from different companies (Facebook, Apache, Node.js Foundation) in different programming languages (Haskell, Erlang, Go, Node). He strongly believes in shared knowledge, and he writes tons of public domain articles about JavaScript, Functional Programming and performance enhancements on jsmonday.dev.
Apache Kafka is the de facto standard streaming data processing platform, being widely deployed as a messaging system, a robust data integration framework (Kafka Connect) and stream processing API (Kafka Streams). But there's more: "Look ma! No java!" Filtering one stream of data into another, creating derived columns - even joining two topics together - it's all possible with KSQL. Come to this talk for a thorough overview of KSQL. There'll be plenty of live coding on streaming data to illustrate clearly KSQL's awesomeness! About: Ugo Landini, Systems Engineer, Confluent Ugo Landini helps companies to build the best enterprise-class streaming data platform using Apache Kafka, enabling realtime decision-making at scale. He is an avid Open Source supporter and is strongly convinced that sharing knowledge is not only a must but also an opportunity of personal growth: co-founder of the JUG Roma & Codemotion, he is an Apache committer, has developed lots of different things in a plethora of different languages and is still convinced he can play decent football.