Data Science Methodology

Shelomi Priskila
4 min readJul 30, 2020

With the increase of computer power , accessing data has been increased gradually. Data scientists have to solve problems related to data science by using decision making process. So data science methodology is an iterative method that guides to the best solutions for the problems on data science.

1 . Business understanding

Data Science methodology begins with spending the time to seek clarification to attain what can be referred to as a business understanding. Having this understanding is placed at the beginning of the methodology because getting clarity around the problem to be solved allows to determine which data will be used to answer the question. Establishing a clearly defined question starts with understanding the goal of the person who is asking the question.

2 . Analytic understanding

Once the problem to be addressed is defined, the appropriate analytic approach for the problem is selected in the context of the business requirements. This is the second stage of the data science methodology. Once a strong understanding of the question is established, the analytic approach can be selected. This means identifying what type of patterns will be needed to address the question most effectively. If the question is to determine probabilities of an action, then a predictive model might be used. If the question is to show relationships, a descriptive approach maybe be required. This would be one that would look at clusters of similar activities based on events and preferences.

There are 4 types of approaches.

  • Descriptive approach — This is related to the current status and information provided.
  • Diagnostic approach — This is based on statistical approach showing that what is happening and why it is happening.
  • Predictive approach — This focuses on the trends or future events probability.
  • Prescriptive approach — This says about how the problem should be solved actually.

3 . Data requirements

Building on the understanding of the problem at hand, and then using the analytical approach selected, the Data Scientist is ready to get started. So in this phase one should find the answers for questions like ‘what’, ‘where’, ‘when’, ‘why’, ‘how’ & ‘who’.

4 . Data collection

In this phase the data requirements are revised and decisions are made as to whether or not the collection requires more or less data. In this stage, data scientist will have a good understanding of what they will be working with. Gaps in data will be identified and plans to either fill or make substitutions will have to be made. This stage is undertaken as a follow-up to the data requirements stage.

5 . Data understanding

Data understanding encompasses all activities related to constructing the data set. This section of the data science methodology answers the question: Is the data that you collected representative of the problem to be solved?

6 . Data preparation

In a sense, data preparation is similar to washing freshly picked vegetables in so far as unwanted elements, such as dirt or imperfections, are removed. Together with data collection and data understanding, data preparation is the most time-consuming phase of a data science project, typically taking seventy percent and even up to even ninety percent of the overall project time. Specifically, the data preparation stage of the methodology answers the question: What are the ways in which data is prepared? To work effectively with the data, it must be prepared in a way that addresses missing or invalid values and removes duplicates, toward ensuring that everything is properly formatted.

7 . Modelling

In this stage, the data scientist will play around with different algorithms to ensure that the variables in play are actually required. The success of data compilation, preparation and modelling, depends on the understanding of the problem at hand, and the appropriate analytical approach being taken. This phase focuses on the building of predictive/descriptive models.

8 . Evaluation

Model evaluation is done during model development. Evaluation allows the quality of the model to be assessed but it’s also an opportunity to see if it meets the initial request. Evaluation answers the question: Does the model used really answer the initial question or does it need to be adjusted? Model evaluation can have two main phases. The first is the diagnostic measures phase, which is used to ensure the model is working as intended. The second phase of evaluation that may be used is statistical significance testing.

9 . Deployment

Once the model is evaluated and the data scientist is confident it will work, it is deployed and put to the ultimate test. Depending on the purpose of the model, it may be rolled out to a limited group of users or in a test environment, to build up confidence in applying the outcome for use across the board.

10 . Feedback

Feedback from the users will help to refine the model and assess it for performance and impact. The value of the model will be dependent on successfully incorporating feedback and making adjustments for as long as the solution is required.

--

--