Cross-sectional tabular data modelling with CNNs

This project is a very big part of my Master Thesis at WNE UW, you can expect: data preprocessing (missing detection and imputation with earlier developed RF based algorithm, data manipulation, image to vector conversion using Monte Carlo simulation) data analysis and transforms (balance check, distributions analysis (box plots and histograms), outliers reduction using quantile clipping, Yeo-Johnson power transform, normalization, special normalization variant for CNNs) n-Fold Cross Validation study on training set and testing on test set several algorithms (Logits, Random Forests, XGBoosts, Feedforward Neural Networks, Convolutional Neural Networks) hyperparameters optimization for Logits (regularization only), RFs and XGBs Networks building (experiments based on results and learning curves) optimized Inception modules for CNNs automatic feature generation method for enlarging CNNs inputs, based on composing sampled features and arithmetic operations from created discrete probability distributions many times CV results comparison (Wilcoxon testing) models training on the entire training set (specifications chosen in CV) models testing and comparison study on 5 datasets - companies bankruptcy prediction for 1-5 years forecast horizons (in progress) Python notebooks in English (a ton of Python code) GitHub repository

Post

WIG index volatility modelling

This project is about volatility models comparison, you can expect: working on the real market data (from stooq.pl) time series data EDA (logarithmic returns transform, realization plots, ACFs, PACFs, descriptive statistics) statistical tests (ARCH LM, Jarque-Bera) GARCH modelling with prior assumptions about epsilon distribution (hypotheses) 4 GARCH models (standard, exponential, threshold, component) with 4 different epsilon distributions each (normal, t-student, skewed t-student, generalized error) 3 naive models (random walk, historical average, moving average) 9 performance metrics (ME, MAE, RMSE, AMAPE, TIC, MME(U), MME(O), DCP, DCPU) quality paper-style report in Polish GitHub repository

Post

Can PCA extract important informations from non-significant features? Neural Network case

This project is about boosting Neural Networks with PCA (other ML algorithms as benchmarks), you can expect: data preparation (renaming labels, balance check, standarization) Random Forest based data imputation algorithm development n-Fold Cross Validation study Machine Learning algorithms (Random Forest, XGBoost with hiperparameters optimization) 6 feature selection methods to spot non-significant features (RF importance, Mutual Information, Spearman correlation between features and with target, General to Specific econometrics procedure, Lasso logistic regression) Neural Networks development (architecture, optimizers, activations, regularization, dropout, batch norm, hyperparameters) Principal Component Analysis of the dataset PCA integration with Nets in CV hypothesis verification using the Wilcoxon test for equality of medians models comparison Python notebook in English (a ton of Python code) quality paper-style report in English project presentation in English GitHub repository

Post

Cryptocurrencies portfolio

This project is about conditional variance function and Value at Risk estimation for cryptocurrencies portfolio, you can expect: cryptocurrencies portfolio building with market capitalization weighting principle data scrapping from coinmarketcap.com time series EDA (realizations plot, returns plot, ACFs, PACFs, returns distribution, descriptive statistics) hypotheses testing (Durbin-Watson, ARCH LM, Jarque-Bera, Ljung-Box tests) time series modelling (ARMA-GARCH family models, residuals analysis, model selection, rolling window estimation) Value at Risk estimation, sensitivity analysis forecasting R markdown report in Polish GitHub repository

Post

Companies bankruptcy modelling with econometric methods

This project is about companies bankruptcy probability modelling with econometric methods, you can expect: paper-style report (in Polish) with quality tables, literature review, methodology description etc. statistical analysis (distributions, correlations, VIFs) econometric modelling (linear probability model, logit regression, probit regression) hypotheses testing (t-student, z tests, linktest) marginal effects (computation, interpretation) ROC curves cutoff optimization bootstrap simulation (Altman Z-Score follow up) huge Jupyter Notebook in Polish (lots of Python code) GitHub repository