Devoogle tiene indexados actualmente 15478 recursos relacionados con el desarrollo de software.

Over the last year, workforces everywhere have had to navigate the pandemic’s many ups and downs. For many, offices were re-opened, then closed again. If there is anything we’ve learned from this, it’s that we need to better prepare our organizations to be flexible and adapt quickly to change.
While many companies struggle to understand how AI will change their industries, wildlife conservation has no lack of immediate challenges for the data science and machine learning skills often locked up in large, for-profit organizations. From spotted hyenas to leafy seadragons, there is an amazing window of opportunity for technologists to be a part of deeply meaningful discoveries in collaboration with a research community flooded with data and hungry to break new ground in understanding Endangered population sizes, movement, social behavior, culture, and even language. Join Wild Me’s Executive Director Jason Holmberg to learn our story in forming a nonprofit, full-time team of software and machine learning engineers focused on supporting wildlife conservation in partnership with biologists across the globe. Every day, we see a growing need for technologists like you to be full team members in the fight against the Sixth Mass Extinction. There are questions about the natural world- entirely new discoveries – that you may be the key to unlocking.
Over the past decade, the global space industry has been growing at an accelerated pace. From reusable rockets, to opportunities for private space flights and tourism, to up and coming cis-lunar activities by both governmental agencies and private companies. Space as the next frontier is fueling new interest from investors and startups across the globe. The speaker will explore the current and future trends of this exciting new industry and how AI and big data play a role in solving humanity’s challenges today, as well as its outward migration off planet.
It is common to have many doubts when dealing with real estate markets. When looking for a house, we usually ask ourselves “How much should that house be worth?” or “Is there a neighborhood similar to the one I like?” On the contrary, when selling a house, apart from the obligatory doubt, “At which price should I put the ad?”, we might ask ourselves “Why can’t I sell my apartment? or “How can I speed up the sale of my house?”. To answer all these questions, within the idealista Data Crew team we strive to improve our products and create new ones through advanced data exploitation. Today we share our experience with a project that tries to answer a few questions from those who face the hard task of selling a house. Selling a home is not easy. First we have to get an idea of ​​its value, which can be difficult without information about the price of similar houses. Moreover, if we want to quickly sell the house, we have to set a relatively low price to make the listing attractive. In order to choose the right balance between asking price and marketing time, we need to know how long it takes on average for houses to sell, but also how each feature of the ad (in particular its asking price) matters in determining the marketing time. In many questions we face as data scientists we rely on regression or classification models. Here we make an exception by applying Survival Analysis, a branch of statistics that deals with predicting the time until an event occurs. While this technique has its origin in medical research, today survival analysis can be used in many areas: for instance manufacturing companies use it to predict the life of their machines, or sales departments to predict customer churn. In idealista we dispose of data from millions of ads related to properties for sale and rent in Spain, Italy, and Portugal. We analyze the life of each ad throughout all its phases, since the market entry up to the exit. Each ad is characterized by an expected time on the market (its “life expectancy”) due to factors that affect its sale probability. Some of these factors depend on property characteristics, others on market conditions. Our goal is to identify which of these factors matter in determining the likelihood of a sale. We use a standard Cox Proportional Hazard model, which, due to its simplicity and interpretability of the results, is usually considered the workhorse model in survival analysis. We feed the model with info about the most recent ads that passed through the idealista portal. The model learns the key characteristics that make a listing attractive to the market. Equipped with this knowledge, the model estimates the expected marketing time, sale and rental probabilities of any advertisement currently listed in the portal. In addition, the model opens the door to the possibility of simulating “what if?” scenarios in which, for instance, we wonder how the marketing time varies if we apply a renovation to the property, reduce the asking price, and so on… What do we learn from all this? That there is life beyond regression and classification models, such as survival analysis. Although this technique comes from the medical field, quite far from the digital and real estate fields, it allows to obtain very interesting results. For this reason, in the idealista Data Crew team we are always open to taking into account what is being done with data in other sectors, no matter how different they are from ours.
Outer Space is getting more and more congested and contested. Governmental and private, scientific and commercial as well as civilian and military space activities expand in an unknown speed. Starting with a US-Soviet space race, we now see more than 70 countries operating their own satellites. Furthermore a global space economy has developed, which now amounts to more than 350 billion Euros annually. It is expected to reach 1 trillion Euros annually by 2040. A space business sector has thus developped, which comprises upstream (satellites, launchers, operations) and downstream (services, applications) in more and more countries through large, mid-cap, SME and startup companies. Space assets have become critical infrastructure and therefore need particular protection. This, however, does also lead to a “securitization” of space activities, meaning that security considerations are getting stronger and stronger. It is mirrored in the growing use of space for military purposes and the estbalishment of space commands and space forces in some countries. Also the threat of deliberately disturbing space activities through jamming, spoofing or blinding increases. In addition to that, the emergence of operating mega-constellations of satellites with thousands or even tens of thousands of space objects is becoming a reality. Together with the still unresolved problem of space debris, the congestion of the orbits around the Earth increases further. In this situation, we have to imagine a new way of using outer space. We have to establish Space Traffic Management (STM). This will be a huge technological, economical and diplomatic challenge – the biggest challenge for spaceflight in this decade. If we do not start acting now, we might face by the end of the decade, what we are facing in climate policy today, it is that we reach tipping points, which make us lose control of the (space) environment. Space Traffic Management (STM) is a complex, multifacetted task, comprising an enormous breadth of disciplines and actors. It can be structured in three distinct aspects: diplomatic, economical and technological. These are discussed in the following: 1) The diplomatic aspect of STM: Space law has so far concentrated on setting the status of outer space and the actors in space. STM will considerably change the focus, in that it requires the establishment of rules of behaviour in outer space. This completely new approach is a new paradigm for regulating space activities and will require completely new formats of rule-development, transparency-building and concrete traffic control. Orientation can be given by international institutions as the Interntional Tlecommunications Union (ITU) or the International Civil Aviation Organisation (ICAO). 2) The economical aspect of STM: If setting up STM fails, the target of the 1 trillion global space economy in 2040 will not be achievable and, to the contrary, business opportunities will decline. STM itself is also a driver for space business in that it offers huge opportunities for companies in providing services to space operators. This growing market also provides splendid opportunities for startups. 3) The technological aspect of STM: Organising the traffic of thousands or even tens of thousands of space objects travelling with the speed of 30000 km/h is extremely complex and difficult. Data provision and management, collision avoidance and also the cleaning up of the orbits from space debris require completely new approaches including AI. By discussing the three structural aspects of STM, it might have become clear, how big the challenge is in view of avoiding damage to science, economy and security and in view of organising the setting up of STM on a global scale. It might, however, also become apparent that establishing STM means technological spring innovation and political opportunities for peaceful cooperation. STM is the task for this decade. Let us start now shaping and establishing it!
When looking for applications of Natural Language Processing models, the law sector clearly stands out as a prime candidate. A significant share of the time of professional lawyers can be spent organizing and going over large amounts of documents, either in the form of actual paper or as digital scans of varying quality. Therefore, automated tools that help classifying and navigating through all the information are an invaluable aid in optimizing time and costs. In the last few years, the advent of large language models such as BETO or GPT has greatly broadened the applicability and efficacy of Natural Language Processing (NLP) solutions. Using a language model as the foundations of an NLP solution has allowed to produce state of the art results in highly complex tasks such as machine translation, question answering, summarization, language generation and many others. Seemingly, language models have become a kind of magic wand that can solve any NLP task, as long as one chooses the correct pre-trained model and tunes it appropriately. And certainly, the results that can be achieved in open datasets by following this recipe are impressive. But when the rubber hits the road in actual applications, real world data proves to be way more difficult to handle: poor quality scans, documents in the hundreds or thousands of pages, or unavailability of public datasets for the problem at hand, are just but a few of the challenges that must be overcome. In this talk we will present the details of “Mapa del Expediente”, a joint R+D project between IIC and the law firm Garrigues. The project applies the latest advances in Spanish language models to organize and classify all the documentary information relating to a case, aiding the lawyer in navigating and perusing all this information. The system we developed can work with raw PDF files in the form of image scans, ranging in the thousands of pages, and joining in the same PDF file a wealth of different kinds of documents with no index or clear-cut boundaries. Using a pipeline of custom language models and optical character recognition and preprocessing tools, our system is able to extract digital text out of the PDF file, discard pages with no useful information, break down the file into each of its logical documents, classify each of them into a taxonomy, and detect mentions to relevant entities such as persons or companies. This produces a highly structured version of the case files, which can then be integrated with a fuzzy search engine and visualization tools to allow easy navigation through all the information, as well as to produce graphs revealing the connections between all the individuals, organizations and documents in the case. Mapa del Expediente is the product of an interdisciplinary team integrating experts in computational linguistics, data scientists, computer engineers and lawyers. This has allowed us to create and annotate our own corpora, develop custom tools and fine-tune all language models to the project needs, which has proven to be key to its success. Also as part of this talk we will introduce LegalBETO, a Spanish language model developed within this project and specialized for the legal domain. LegalBETO produces the best results for the benchmarks ran with real case file documents, performing over all publicly available models for the Spanish language, both for the general and legal domain.
Data comes at us fast” is what they say. In fact, the last couple of years taught us how to successfully cleanse, store, retrieve, process, and visualize large amounts of data in a batch or streaming way. Despite these advances, data sharing has been severely limited because sharing solutions were tied to a single vendor, did not work for live data, came with severe security issues, and did not scale to the bandwidth of modern cloud object stores. Conferences have been filled for many years with sessions about how to architect applications and master the APIs of your services, but recent events have shown a huge business demand for sharing massive amounts of live data in the most direct scalable way possible. One example is open data sets of genomic data shared publicly for the development of vaccines. Still, many commercial use cases share news, financial or geological data to a restricted audience where the data has to be secured. In this session, dive deep into an open source solution for sharing massive amounts of live data in a cheap, secure, and scalable way. Delta sharing is an open source project donated to the Linux Foundation. It uses an open REST protocol to secure the real-time exchange of large data sets, enabling secure data sharing across products for the first time. It leverages modern cloud object stores, such as S3, ADLS, or GCS, to reliably transfer large data sets. There are two parties involved: Data Providers and Recipients. The data provider decides what data to share and runs a sharing server. An open-sourced reference sharing service is available to get started for sharing Apache Parque or tables. Any client supporting pandas, Apache Spark™, Rust, or Python, can connect to the sharing server. Clients always read the latest version of the data, and they can provide filters on the data (e.g., “country=ES”) to read a subset of the data. Since the data is presented as pandas or Spark dataframes the integration with ML frameworks such as MLflow or Sagemaker is seamless.
Everyone claims that they are doing products in an agile, user oriented way. And yet, the majority of the products are still not welcomed by customers after being released. There is a huge gap between what we think it means to be user oriented and actually been user oriented. In this talk I will present my personal methodology that I have developed during the last 9 years that helped me to build multiple highly successful products and services. The framework contains of the following pillars: Identify the problem, problem should be: * Real customer problem *Problem that is top priority for the customer Identify the solution: * Is this the simplest solution for the problem? * Is it easily integratable with the current customer infrastructure? * Solves a problem that will exist (and will remain primary problem) by the time it is delivered * Easily pluggable in the existing customer’s infrastructure * Cost of adopting the product should be smaller (ideally: drastically lower) than the benefits that customer will get Execute: * Do not aim to deliver the solution, aim to onboard the customer * It is much simpler to deliver solution for one customer and generalize later than to deliver generic solution that can be used by at least one customer * The only criteria of success is onboarded customer We will show in detail how to build a team that executes in accordance with these principles, just some examples of key fundamental things: * Success of the product is everyone’s jobs * Engineers who are working on the product directly responsible to onboard the customer onto what they are building * Done-done-done is not when code is deployed to the production, it is when customer is using it While these are very common things that all of the managers/PM claiming to be doing there are many hidden traps when leaders are trying to do this, just some of them: * This process is about incremental changes, my product is revolutionary. * We have this amazing tool/technology/solution, let’s find the problem that it can solve for our customers. * I have validated the problem, therefore my solution is also validated. * I put the word “user” in my goal and therefore now it is a user oriented goal. * Different user audience for validating the problem and building the product for. It is very simple to pitch a free product to the students and ask if they will be using it for free, it is a completely different thing to sell the same product for money to a CTO. * Ignoring the last mile of integration. *Sticking to the original plan/scope while user’s problems clearly have changed We will also discuss how to start transitioning an organization into this mode, how to start refactoring the engineering team and whole organization, what should be the concrete first steps, like: * Do a revision of the roadmap: * put a concrete problem next to the feature * Re-evaluate problems * Stack ranked problems (instead of feature) * Refactor incentives for eng team to start onboarding customers (solving problems for the customers) instead of just delivering features. Introduce processes of building deep connections with the customers and collecting/evaluating their problems.
We live in a world of processes, whether you find yourself following them on your daily basis or just under specific situations, what we can all agree on is that they can always be improved. The question resides on how and consequently, how much it is going to cost. Traditionally, in order to know deeply the processes, a non- negligible amount of sessions with every agent involved were needed, that situation led to large projects, biased results and a lot of information filling papers and presentations. Additionally, this deep dive into the processes will only give you information about the status of the process, typically named, as-is, but who will define the to-be? And then, on what basis? As part of this exercise, solutions would have to be studied in order to find the appropriate justification. Every business question would need a specific analysis associated, tying even more resources before having certain assurance that they will work. Throughout Artificial Intelligence, processes can be modelled – no matter the amount of data sources that they involve- , analyzed –inside out- and improved, taking less effort and time than it used to take. The good news about this is that no matter how techie you or your team are, emerging technologies have never been so accessible and user-friendly. Thus, this new scenario merges both worlds, sitting at the same table business and analytics teams where they all speak the same language and share the same concerns. This is not only our point of view, but also Gartner’s, as they stated at their 2020 Guide “Process mining helps enterprise architecture understand operations and performance in order to create operational resilience”. Let me stop at this last idea and deep dive in the details. What we are saying is that regardless of how fragmented the information is into different systems managing the process, and even further, regardless of the existence of every piece of the puzzle; by applying Process Mining techniques you will be able to reconstruct the puzzle and analyze your process from an end-to-end point of view. Once the process is reconstructed in a data-driven manner, it can be zoomed in and out as far as the granularity of data allows us, finding hidden variants that would not have been discovered following the traditional way. With the right interpretation and by applying advanced analytics techniques, it can easily perform a root cause analysis that will show the guidelines to be followed (and sometimes, even implement them). Finally, it can be the foundations of more advanced solutions such as: • Real-time monitoring assets to help achieve the target values of business KPI’s • Alert systems than can anticipate issues; • Next-best action engine that can recommend the agent the best course of action in a particular situation. • And many more. The benefits of Process Mining are countless and the main reason is because they are based on data, and as you know, if you let data speak, the organization gets aligned instantly. At this point is where the worlds converge, after the analytical eye has structured and analyzed data, it can be shown in an intuitive and interactive way to the main owners of the processes, and all together in a question-and-answer session might arrive to significant results. We firmly believe that applying Process Mining techniques can save time, money and effort to any organization by making data-driven decisions. We would love to share with you our vision and the principal actors of the market.
As individuals, we use time series data in everyday life all the time; If you’re trying to improve your health, you may track how many steps you take daily, and relate that to your body weight or size over time to understand how well you’re doing. This is clearly a small-scale example, but on the other end of the spectrum, large-scale time series use cases abound in our current technological landscape. Be it tracking the price of a stock or cryptocurrency that changes every millisecond, performance and health metrics of a video streaming application, sensors for reading temperature, pressure and humidity, or the information generated from millions of IoT devices. Modern digital applications require collecting, storing, and analyzing time series data at extreme scale, and with performance that a relational database simply cannot provide. We have all seen very creative solutions built to work around this problem, but as throughput needs increase, scaling them becomes a major challenge. To get the job done, developers end up landing, transforming, and moving data around repeatedly, using multiple components pipelined together. Looking at these solutions really feels like looking at Rube Goldberg machines. It’s staggering to see how complex architectures become in order to satisfy the needs of these workloads. Most importantly, all of this is something that needed to be built, managed, and maintained, and it still doesn’t meet very high scale and performance needs. Many time series applications can generate enormous volumes of data. One common example here is video streaming. The act of delivering high quality video content is a very complex process. Understanding load latency, video frame drops, and user activity is something that needs to happen at massive scale and in real time. This process alone can generate several GBs of data every second, while easily running hundreds of thousands, sometimes over a million, queries per hour. A relational database certainly isn’t the right choice here. Which is exactly why we built Timestream at AWS. Timestream started out by decoupling data ingestion, storage, and query such that each can scale independently. The design keeps each sub-system simple, making it easier to achieve unwavering reliability, while also eliminating scaling bottlenecks, and reducing the chances of correlated system failures which becomes more important as the system grows. At the same time, in order to manage overall growth, the system is cell based – rather than scale the system as a whole, we segment the system into multiple smaller copies of itself so that these cells can be tested at full scale, and a system problem in one cell can’t affect activity in any of the other cells. In this session, I will introduce the problem of time-series, I will take a look at some architectures that have been used it the past to work around the problem, and I will then introduce Amazon Timestream, a purpose-built database to process and analyze time-series data at scale. In this session I will describe the time-series problem, discuss the architecture of Amazon Timestream, and demo how it can be used to ingest and process time-series data at scale as a fully managed service. I will also demo how it can be easily integrated with open source tools like Apache Flink or Grafana.