Skip to content

Applied regression analysis to predict the spread of COVID-19 by carefully selecting features like daily cases, mobility data, and public health interventions.Included preprocessing steps such as feature engineering, imputation of missing data, and normalization, evaluated using metrics MSE and R², with cross-validation

Notifications You must be signed in to change notification settings

srinikha193/Covid19_EDA-Modeling

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 

Repository files navigation

Covid19_EDA-Modeling

In the project, we employed regression models as a fundamental technique to analyze and predict the spread of COVID-19 based on historical data. The key steps and considerations in this analysis included:

Data Preprocessing:

Feature Engineering: We identified critical features, including daily confirmed cases, daily deaths, recovered cases, mobility data, and public health interventions. These features were carefully selected to capture the underlying factors influencing COVID-19 spread. Handling Missing Data: Missing values in the dataset were addressed through imputation techniques. For continuous variables, we employed interpolation or mean imputation methods, while for categorical variables, mode imputation was used. Normalization: To ensure that all features contributed equally to the model, continuous variables were normalized using min-max scaling or z-score standardization. Model Selection:

We explored multiple regression models, including Linear Regression and Polynomial Regression, to identify the best fit for the data. Linear Regression was initially used to model the relationship between the independent variables (e.g., mobility data, intervention measures) and the dependent variable (daily confirmed cases). This model helped in understanding the linear relationship and was a baseline for further analysis. Polynomial Regression was then employed to capture any non-linear relationships in the data. The degree of the polynomial was carefully chosen based on cross-validation techniques to avoid overfitting. Model Evaluation:

The performance of the regression models was evaluated using metrics such as Mean Squared Error (MSE), R-squared (R²), and Root Mean Squared Error (RMSE). Cross-validation was employed to ensure the robustness of the model, where the data was split into training and testing sets. The model's performance on unseen data was crucial in assessing its generalization capabilities. Model Interpretation:

The coefficients of the regression models were analyzed to understand the impact of each feature on the predicted outcomes. For instance, mobility data and specific public health interventions were found to have significant coefficients, indicating their strong influence on the spread of the virus. Residual analysis was performed to check for any patterns that might suggest a poor fit or the need for further feature engineering. Time Series Component:

Given the temporal nature of the data, the regression models were also integrated with time-lagged features. This approach helped in capturing the delayed effects of interventions and changes in mobility on COVID-19 spread.

About

Applied regression analysis to predict the spread of COVID-19 by carefully selecting features like daily cases, mobility data, and public health interventions.Included preprocessing steps such as feature engineering, imputation of missing data, and normalization, evaluated using metrics MSE and R², with cross-validation

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published