# Feedback for RMT4ELM

Zhenyu

## Title

• I believe the title should not be too "general", why not use directly "Extreme Learning Machine"? The word "Neural Networks" can be misleading. I think we should emphasise the followings points
• this is theoretical work, in which we evaluate the performance of…
• we work on randomly connected and single layer NNs (since in the feature, we may move on to multilayer…)
• we use RMT
• Question: I believe you would like to submit to a math journal (ex. journal of multivariate analysis), but those journals are rarely cited, no?

## Abstract

• TOO many notations, but seem to be unavoidable, personally I have no better idea. Any way, I think in abstract we should try to use more text than symbol.
• It seems that you have to use these notations because you reintroduce the concept of ELM, it is really necessary?
• the sentence

… is an input data matrix of bounded operator norm

• the last sentence:

provide practical insights within the underlying mechanisms into play in random neural networks.

is not concrete, maybe adding some keywords like "activation function" will help and the most importantly, will attract "Machine Learning(ML)" people

## Introduction

• just for your information, I've found this paper: An ELM-based model for affective analogical reasoning, in which the superhuman performance of ELM in a specific application is talked about (a rather well-cited paper for ELM recently)
• Ok so in fact you do not really want to use/limit to the word "ELM", Humm, why not, but it's still a rather popular algorithm anyway, imagine a "ML" who is working on ELM, he may not realise that our paper is just what he needs…
• For the part just above Section 2: System Model, where you talk about our second result of MSE test. I think here, since we will not really give the proof, we have to try to convince the reader that our results are true, the reason why we do not prove is just we are nor sure what assumptions to use, i.e. how to describe the link between $X$ and $\hat{X}$.

## System Model

• the sentence

… for some matrix $\beta$ to be designed, where $\sigma(Wx)$ is understood entry-wise

is a little redundant, you have just talked about that $\sigma$ is entry-wised, no?

• the sentence

which are regressed through $\beta$ against a known output matrix $Y$

• use \emph for "ridge-regressor"
• preliminary results on the behavior of the random matrix $Q$
• I notice that you try to avoid the notion of "activation function" for $\sigma$, why?

## Main Results

• maybe a very stupide question: you are always using the word "deterministic equivalent" for $E[Q]$ or $E[QAQ]$ etc, but they are already deterministic, isn't better to say "asymptotic equivalent" in the case?
• the rest of the main results are really well-written