Post

Endogeneity in statistical inference and forecasting

This project is about statistical inference and predictions in the face of identification problems (endogeneity), you can expect: data generation according to own-defined Data Generation Process endogeneity problems analysis (positive and negative mediators, confounders) OLS models instrumental variables simulation notebook with Python code project report GitHub repository PS. Because I couldn’t find suitable and free photo for the background I created it myself :)

Post

Cross-sectional tabular data modelling with CNNs

This project is a very big part of my Master Thesis at WNE UW, you can expect: data preprocessing (missing detection and imputation with earlier developed RF based algorithm, data manipulation, image to vector conversion using Monte Carlo simulation) data analysis and transforms (balance check, distributions analysis (box plots and histograms), outliers reduction using quantile clipping, Yeo-Johnson power transform, normalization, special normalization variant for CNNs) n-Fold Cross Validation study on training set and testing on test set several algorithms (Logits, Random Forests, XGBoosts, Feedforward Neural Networks, Convolutional Neural Networks) hyperparameters optimization for Logits (regularization only), RFs and XGBs Networks building (experiments based on results and learning curves) optimized Inception modules for CNNs automatic feature generation method for enlarging CNNs inputs, based on composing sampled features and arithmetic operations from created discrete probability distributions many times CV results comparison (Wilcoxon testing) models training on the entire training set (specifications chosen in CV) models testing and comparison study on 5 datasets - companies bankruptcy prediction for 1-5 years forecast horizons (in progress) Python notebooks in English (a ton of Python code) GitHub repository

Post

WIG index volatility modelling

This project is about volatility models comparison, you can expect: working on the real market data (from stooq.pl) time series data EDA (logarithmic returns transform, realization plots, ACFs, PACFs, descriptive statistics) statistical tests (ARCH LM, Jarque-Bera) GARCH modelling with prior assumptions about epsilon distribution (hypotheses) 4 GARCH models (standard, exponential, threshold, component) with 4 different epsilon distributions each (normal, t-student, skewed t-student, generalized error) 3 naive models (random walk, historical average, moving average) 9 performance metrics (ME, MAE, RMSE, AMAPE, TIC, MME(U), MME(O), DCP, DCPU) quality paper-style report in Polish GitHub repository

Post

Rainfall modelling with OLS and Kernel Regression

This project is about Rainfall modelling and assumptions testing, you can expect: working on real data from Polish Institute of Meteorology and Weather Management data cleaning and analysis econometric modelling (OLS, Kernel Regression) OLS assumptions testing (RESET (several alternatives), Breusch-Pagan, Breusch-Godfrey, Jarque-Bera, Rescaled Moments, VIFs) model selection (information criteria) performance measurement (MSE, RMSE, MAE, MAPE, R squared and adjusted R squared, F test, scatterplots) data transformations (Principal Component Analysis, Box Cox power transform) “forecasting” (see the report why quotation marks used ;)) discussion about endogeneity Python notebook (and many HTML files with models for many weather stations) quality paper-style report in Polish GitHub repository

Post

Cryptocurrencies portfolio

This project is about conditional variance function and Value at Risk estimation for cryptocurrencies portfolio, you can expect: cryptocurrencies portfolio building with market capitalization weighting principle data scrapping from coinmarketcap.com time series EDA (realizations plot, returns plot, ACFs, PACFs, returns distribution, descriptive statistics) hypotheses testing (Durbin-Watson, ARCH LM, Jarque-Bera, Ljung-Box tests) time series modelling (ARMA-GARCH family models, residuals analysis, model selection, rolling window estimation) Value at Risk estimation, sensitivity analysis forecasting R markdown report in Polish GitHub repository

Post

Churn Modelling

This project is about churn modelling (binary classification task) with Machine Learning and Neural Networks in Python, you can expect: statistical analysis (t-student and Jarque-Bera tests, Correlations, Mutual Information) feature selection (feature importances, sensitivity, statistical methods) feature engineering (monotonic transforms, binarization, label encoding, interactions) lots of ML algorithms (Logits, Random Forests, XGBoosts, SVMs, kNNs, Neural Nets, ensembling) n-Folds Cross Validation study bootstrap simulation several notebooks in Polish (lots of Python code) GitHub repository

Post

Companies bankruptcy modelling with econometric methods

This project is about companies bankruptcy probability modelling with econometric methods, you can expect: paper-style report (in Polish) with quality tables, literature review, methodology description etc. statistical analysis (distributions, correlations, VIFs) econometric modelling (linear probability model, logit regression, probit regression) hypotheses testing (t-student, z tests, linktest) marginal effects (computation, interpretation) ROC curves cutoff optimization bootstrap simulation (Altman Z-Score follow up) huge Jupyter Notebook in Polish (lots of Python code) GitHub repository

Post

BMI distribution estimation

This project is about BMI distribution estimation, you can expect: Maximum Likelihood vs Generalized Method of Moments distribution estimations Gaussian vs Weibull dilemma for BMI hypotheses testing (LR, Wald, z tests) WHO standards report in Polish GitHub repository