Search other topycs


Data science in street mode

Data science in street mode

Data science in street mode

To achieve accurate machine learning models, complete and relevant information is required, but in real life there is an absence of data, a lack of labels, limited resources and by minors who are the bread and butter in data science; on the other hand, creativity and mastery of other areas become relevant in the profile of the data scientist.

A good example that represents the polarization of this profile is the comparison with a musician. On the one hand, there are those who trained in a conservatory or a university and on the other, who acquired the skill empirically in bars and streets.

What are the perceptions that the parties involved (customer and scientist) may have regarding machine learning applications and data science in specific cases?

Market assumptions

"Artificial intelligence is so sophisticated that it can solve all my problems, from replacing my employees, to designing the intelligent software pieces for the next iteration."

Manager of some company.

Although the previous assumption exemplified does not intend to generalize Colombian companies, there is a whole spectrum of this position regarding the use of machine learning. Starting with mature companies that have their own equipment for data analysis, to the opposite extreme, where they do not have the profiles required by the area, causing a high degree of disinformation about processes and scope of artificial intelligence, promoted by the high expectations that commercial departments generate about a product or service.

Data science: Scientist's assumptions

"This problem is not going to be solved properly, because they are not giving me the data that fully describes the business." - Data scientist in some company.

Now, from the scientist's perspective, they can present resistance to work in a context that is often not ideal, or that simply does not have the necessary resources.

This comparison is not intended to classify each extreme as good or bad, simply each case has different applications. From our point of view, finding the balance between knowledge, flexibility and creativity favors customer satisfaction and gives the scientist more skills to approach the problem from different angles.

Need and challenge

Need and challenge

In our context, the challenge for artificial intelligence is to achieve maturity in the relationship with the industry, which allows achieving the objectives of the products and consolidating knowledge of the area, its limits and pedagogical applications as part of the iteration process.

Below we list three cases related to market assumptions about data leverage, from the most pessimistic to the ideal context. The initial interest of the client, the work done, the methodologies, techniques and materials used during the process are present in each one.

Case 1: Identification on text

Interest: Transform the medical process using artificial intelligence.

The real work: Identify medications, pathologies and events of interest in medical records written in free text.

Material used: Neural Net + convolutional for bidirectional level representation characters + glove for representation at word + conditional random field modeling tag sequences.

text identification box

Case 2: Price estimation

Interest: Predicting the commercial value of an apartment in Bogotá.

The real work: Consolidate, explore and model the data set, to estimate the commercial value of an apartment according to some of its characteristics.

Material used: Characteristics engineering + neural networks (deep & wide model) + descriptive of the error.

price estimate table

Case 3: Prediction in economics

Interest: Anticipate the sectoral economic trend.

The Real Work: Consolidating, Exploring, Sorting, Predicting, and Viewing Trade Transaction History and Google Search Trends.

Material used: Time series + feature engineering + random forests + eXtreme Gradient Boosting + Google Trends.

economy prediction

As a conclusion to the assumptions of both extremes, we have understood that the most forceful solutions are obtained from interaction, dialogue, knowledge and understanding on both sides: industry and science.

Our proposal: “we design new futures through data”, represents the generation of value from information for the idea of new products and services, understanding and meeting the needs of the target audiences.

To achieve this goal, at Grupodot we have an area dedicated exclusively to data analysis, made up of various profiles such as physicists, designers, economists, advertisers, engineers and philosophers. All of them contribute, from their fields of knowledge, to the analysis, design and construction of different types of products where artificial intelligence techniques are applied.