python

Recursos de programación de python
Do Data Scientists have a future in the era of Machine Learning automation? by Nerea Luis Mingueza at Big Things Conference. In Artificial Intelligence, the figure of “Data Scientist” quickly found its niche in companies to create, train and experiment with data and models. All this work was done manually working with Jupyter Notebooks and Python libraries… and a lot of patience! A few years ago, the academic world raised this hypothesis: What if we train a model whose purpose in itself is to choose the parameters that optimize the performance of the model I am training? This hypothesis has been materialized in the trend known as AutoML: a multitude of libraries and processes that promise to automate the configuration and training of models, the reproducibility of experiments, generation of metrics … so that we can focus purely on decision-making. To what extent is it true? Let’s talk about the origins of AutoML, how it evolved and the proliferation of low-code tools, products, frameworks and libraries that are changing the rules of the game. Should Data Scientists be concerned?
Microservices architectures are inherently distributed and building such solutions always bring interesting challenges to the table: resilient service invocation, distributed transactions, on-demand scaling, idempotent message processing and more. Deploying Microservices on Kubernetes doesn’t solve these problems and Developers need to learn and use many SDK’s on top of frameworks such as .NET, Java, Python, Golang, etc…. This session will show you how to overcome those challenges, using Dapr: a portable runtime to build distributed, scalable, and event-driven microservices.
Our purpose is to provide an analysis of the basic objectives and value propositions of any Customer Data Platform by encouraging discussion with participants and sharing our own experience. In this sense, we would like to have the opportunity to present a production use case of a multi-cloud Customer Data Platform. A posteriori, to enrich our presentation, we will start a discussion on the reasons for separating a CDP into two domains: the domain of personally identifiable data and the domain of anonymised data. We will then delve into the specific production use, examining the value propositions both for the end-customer, for businesses and from an operational point of view. Through these points, we are convinced that the audience will clearly see the business and technical drivers for designing and building the CDP, not 100% in Salesforce and not 100% in GCP. In order to illustrate our presentation through a real case, we propose to deepen the discussion with a technical twist and we will share the experience of Making Science in building a custom CDP with a cloud-first design and development. Among the points we will highlight are: • A review of the GCP and Salesforce services that the solution used. o A review of the GCP and Python-based technology stack and development design to continuously ingest signals and events from over 38 data sources. • The management of the bi-directional exchange of signals with the client’s website. • The selection of serverless GCP technologies for ingesting signals from the customer’s website while protecting the system from external predators. • The design approach to protect the solution from duplicate signal transmissions from streaming sources. • The no-harassment approach to continuous batch event processing. • The design point of view to protect the solution from duplicate batch transmissions. We will walk through our design considerations with respect to signal/event publishing to CDP processes and external machine learning enrichment systems. • Persistent keyless data store operating at the core of the CDP giving the most up-to-date view of the client in both an anonymised and de-anonymised view depending on the domain. • The bi-directional anonymisation/de-anonymisation gateway between Salesforce and GCP. The gateway had to support sending anonymised data to the marketing analytics domain within GCP and support receiving custom engagement requests from the marketing analytics domain to the customer analytics domain. We will examine in detail the GCP technologies used to determine which attributes of a given data flow/feed needed to be anonymised. • We will show how signal enrichment was supported from both the personally identifiable data domain and the anonymised CDP analytics domain. • What additional design and development steps we took to ensure GDPR compliance by leveraging features within GCP. We will also examine our implementation to ensure traceability of consent on a customer basis. • The design and development approach and technologies used to deliver ‘human readable’ analytics, even as the CDP’s customer-centric data warehouse continually changes. • We will review the selection of our GCP serverless data warehouse and take a look at the design approaches applied to ensure efficient, consistent and governed access to data. To conclude, we will put the spotlight on key learnings from the multi-cloud, multi-domain Client Data Platform implementation; as well as share Making Science’s design approach to prepare a CDP to be deployed in a cloud-agnostic manner.
Data comes at us fast” is what they say. In fact, the last couple of years taught us how to successfully cleanse, store, retrieve, process, and visualize large amounts of data in a batch or streaming way. Despite these advances, data sharing has been severely limited because sharing solutions were tied to a single vendor, did not work for live data, came with severe security issues, and did not scale to the bandwidth of modern cloud object stores. Conferences have been filled for many years with sessions about how to architect applications and master the APIs of your services, but recent events have shown a huge business demand for sharing massive amounts of live data in the most direct scalable way possible. One example is open data sets of genomic data shared publicly for the development of vaccines. Still, many commercial use cases share news, financial or geological data to a restricted audience where the data has to be secured. In this session, dive deep into an open source solution for sharing massive amounts of live data in a cheap, secure, and scalable way. Delta sharing is an open source project donated to the Linux Foundation. It uses an open REST protocol to secure the real-time exchange of large data sets, enabling secure data sharing across products for the first time. It leverages modern cloud object stores, such as S3, ADLS, or GCS, to reliably transfer large data sets. There are two parties involved: Data Providers and Recipients. The data provider decides what data to share and runs a sharing server. An open-sourced reference sharing service is available to get started for sharing Apache Parque or Delta.io tables. Any client supporting pandas, Apache Spark™, Rust, or Python, can connect to the sharing server. Clients always read the latest version of the data, and they can provide filters on the data (e.g., “country=ES”) to read a subset of the data. Since the data is presented as pandas or Spark dataframes the integration with ML frameworks such as MLflow or Sagemaker is seamless.
Celebramos nuestro último meetup online de 2021, confiando en poder volver en algún momento de 2022 a los meetups presenciales y encontrarnos físicamente de nuevo. Mientras tanto, cerramos el año con una charla espectacular de un viejo conocido que ya ha pasado por Python Madrid en 2018 hablando de los "Papeles de Panamá". Se trata de Miguel Fiandor que en esta ocasión nos hablará de un tema de actualidad como los "Pandora Papers" y como ha sido posible este increíble trabajo de periodismo de datos gracias a Python y su ecosistema de bibliotecas datos. En esta charla vamos a ver que papel juegan Python y Pandas en una investigación periodística como los Pandora Papers. Veremos las librerías que utilizamos en las ETLs de transformación de millones de documentos en [mini] bases de datos, los distintos escenarios que se presentan, y nuestras herramientas preferidas, algunas cocinadas en casa. Sobre Miguel Fiandor: Soy Ingeniero Informático de la Universidad Politécnica de Madrid, en el 2012 me contagié de la fiebre de los datos, opendata, webscraping, etc, y decidí lanzarme a la piscina y emprender con un proyecto personal, 'Transparecia de Cuentas Públicas'. Económicamente no fue la mejor idea y empecé a hacer de freelance con Python y Django, en uno de estos saltos acabé en el ICIJ. Llevo 6 años como desarrollador de aplicaciones y bases de datos para el ICIJ, el Consorcio Internacional de Periodistas de Investigación que publicaron los 'Papeles de Panamá', 'Papeles del Paraíso', entre otros. Utilizo Python a diario para desarrollar los procesos ETL que dan forma a los millones de documentos filtrados para convertirlos en herramientas útiles para los periodistas, o también apps públicas y accesibles para todo el mundo.
¿Conoces #Ansible? 🤓 Se trata de una de las herramientas de automatización más famosa y útil para los administradores de sistema y #DevOps en la actualidad. 🔁 Permite centralizar la configuración de numerosos servidores, dispositivos de red y #Cloud Providers de una forma sencilla y automatizada. ✅ Ansible gestiona sus diferentes nodos a través de SSH y únicamente requiere #Python🐍 en el servidor remoto en el que se vaya a ejecutar para poder utilizarlo. En este #MeetupsGeeksHubs veremos cómo automatizar tareas de instalación, actualización y despliegues de nuestras aplicaciones, así como los fundamentos de Ansible y cómo desplegar con ansistrano nuestro servicio. Conoce más a Marcos: https://www.linkedin.com/in/fcoparo/ 🎥 Sesiones de Marcos Palacios en los #MeetupsGeeksHubs 🎥 👉 Fundamentos “Hello RabbitMQ”: https://www.youtube.com/watch?v=BMemTv8Jb8k&t=1593s 👉 HELLO, Infraestructura como código (IAC): https://www.youtube.com/watch?v=Hr4qYmLTLQo&t=2s Comenta en twitter mencionando a @geeks_academy con el hashtag #MeetupsGeeksHubs. 🤝 Únete a nuestra Comunidad en Slack: https://join.slack.com/t/geekshubs/shared_invite/zt-xjn9x7ht-yxM3kfCqnsYCYfu_x4mdJA 🚀 Bootcamp Full Stack Developer Presencial en Valencia, Madrid y Barcelona: https://bootcamp.geekshubsacademy.com/full-stack-developer/ 💼 Ofertas de empleo IT: https://geekshubs.com/talento/candidatos/ 🎥 Canal de Youtube: https://www.youtube.com/user/geekshubs 🐦 Twitter GeeksHubs: https://twitter.com/geekshubs 🐦 Twitter GeeksHubs Academy: https://twitter.com/geeks_academy 📸 Instagram: https://instagram.com/geekshubs ℹ️️️️️ LinkedIn GeeksHubs: https://www.linkedin.com/company/geeks-hubs ℹ️️️️️ LinkedIn GeeksHubs Academy: https://www.linkedin.com/school/geekshubsacademy/ ? Facebook GeeksHubs: https://facebook.com/geekshubs ? Facebook GeeksHubs Academy: https://www.facebook.com/geekshubsacademy 📕 Plataforma online +30 cursos gratuitos: https://geekshubsacademy.com/ 🎧 Podcast I am Geek: https://open.spotify.com/show/4G4PpNzPOeWh5DrrumDXCd
Ismael Mendonça nos presenta "Python testing best practices". Resumen: En Python se pueden emplear diferentes frameworks y estrategias para testing, el ecosistema nos provee de librerías como pytest, nose y unittest para escribir pruebas automáticas. Es común conseguir proyectos donde se entremezcla el uso de estas librerías o pasa que no se entiende el "scope" y lo que ofrece cada una. La charla esta enfocada a dar a conocer algunos "gotchas" al momento de escribir pruebas y varios consejos para conseguir escribir mejores pruebas. --- La novena edición de la PyConES se celebra como un evento en línea y totalmente gratuito durante los días 2 y 3 de Octubre de 2021. Web: https://2021.es.pycon.org Agenda: https://2021.es.pycon.org/ #schedule
Carlos Alberto Gomez Gonzalez nos presenta "Deep learning-based super-resolution of climate forecast data". Resumen: Seasonal climate predictions can forecast the climate variability up to several months ahead and support a wide range of societal activities. The coarse spatial resolution of seasonal forecasts needs to be refined to the regional/local scale for specific applications. Super-resolution, or statistical downscaling in the climate jargon, aims at learning a mapping between low and high resolution images (gridded climate datasets). In this talk, I would like to explain how I developed deep convolutional networks in supervised and generative adversarial training frameworks for the task of super-resolving seasonal forecast of temperature over Catalonia. Additionally, I will stress on the importance of Python for scientific software development and for the application of cutting-edge machine learning and AI in Earth Sciences. --- La novena edición de la PyConES se celebra como un evento en línea y totalmente gratuito durante los días 2 y 3 de Octubre de 2021. Web: https://2021.es.pycon.org Agenda: https://2021.es.pycon.org/ #schedule
Andrea Morales Garzón y Antonio Manjavacas Lucas nos presentan "Introducción a los algoritmos bioinspirados con Python". Resumen: La naturaleza ha sido desde siempre un foco de inspiración para la resolución de problemas cotidianos: desde los más simples a los más complejos. La Inteligencia Artificial tiene mucho que aportar en este aspecto, y es que existe todo un campo de estudio centrado en cómo la naturaleza y los seres vivos resuelven procesos complejos de forma eficiente, con el objetivo de replicarlo y optimizar la solución de problemas de computación costosos. Los algoritmos basados en esta idea son conocidos como "bioinspirados". El objetivo de esta charla será conocer más de cerca qué son los algoritmos bioinspirados y cómo estos pueden llevarse a la práctica utilizando Python. En concreto, nos centraremos en el paquete inspyred, orientado a la computación bioinspirada, del cual mostraremos sus principales funcionalidades. Finalmente, veremos algunos ejemplos prácticos destinados a la resolución de problemas de optimización. --- La novena edición de la PyConES se celebra como un evento en línea y totalmente gratuito durante los días 2 y 3 de Octubre de 2021. Web: https://2021.es.pycon.org Agenda: https://2021.es.pycon.org/ #schedule
Jaime Crespo nos presenta "Haciendo copias de seguridad de todo el conocimiento humano con Python y software libre": Resumen: ¿Te imaginas que un día desaparecen todos los artículos y fotos de Wikipedia? De acuerdo con algunas estimaciones se tardarían cientos de millones de horas en reescribir el casi Petabyte de datos generado hasta el momento por los voluntarios del proyecto. En esta sesión mostraremos cómo evitamos que esto pueda suceder, enseñando cómo implementamos diversos proyectos de recuperación de los archivos y bases de datos usados por Wikipedia. En ellos usamos exclusivamente herramientas de software libre y como principal lenguaje de automatización de sistemas: Python. --- La novena edición de la PyConES se celebra como un evento en línea y totalmente gratuito durante los días 2 y 3 de Octubre de 2021. Web: https://2021.es.pycon.org Agenda: https://2021.es.pycon.org/ #schedule