In this video, I explain what convolutions and convolutional neural networks are, and introduce, in detail, one of the best and most used state-of-the-art CNN architectures in 2020: DenseNet.
Watch the Video Below
If you would like me to cover any other neural network architecture or research paper, please let me know in the comments!
References
- DenseNet paper:
- DenseNet on GitHub:
Follow me for more AI content
- LinkedIn:
- Twitter:
- Facebook:
- The best courses in AI:
- Join Our Discord channel, Learn AI Together:
Chapters
0:00 - Hey! Tap the Thumbs Up button and Subscribe. You'll learn a lot of cool stuff, I promise.0:18 - The Convolutional Neural Networks0:39 - A … convolution?2:07 - Training a CNN2:45 - The activation function: ReLU3:20 - The pooling layers: Max-Pooling4:40 - The state-of-the-art CNNs: A quick history5:23 - The most promising CNN architecture: DenseNet8:39 - Conclusion
Video Transcript
facial recognition targeted ads image00:03recognition00:04video analysis animali detection these00:07are all powerful ai applications00:09you must already have heard of at least00:12once00:12but do you know what they all have in00:14common they are all using the same type00:16of neural network architecture00:18the convolutional neural network they00:21are the most used type of neural00:22networks00:23and the best for any computer vision00:25applications00:26once you understand these you are ready00:28to dive into the field and become an00:30expert00:31the convolutional neural networks are a00:33family of deep neural networks that uses00:36mainly convolutions to achieve the task00:38expected as the name says convolution is00:41the process00:42where the original image which is our00:45input in a computer vision application00:47is convolved using filters that detects00:50important small features of an image00:52such as edges the network will00:55autonomously learn filter's value that00:57detect00:57important features to match the output00:59we want to have01:01such as the name of the object in a01:02specific image01:04sent as input these filters are01:06basically squares of size01:083x3 or 5x5 so they can detect the01:12direction01:12of the edge left right up or down01:16just like you can see in this image the01:18process of convolution makes a dot01:20product between the filter and the01:22pixels it faces01:24then it goes to the right and does it01:26again convolving the whole01:28image once it's done these give us the01:31output of the first convolution layer01:33which is called01:34a feature map then we do the same thing01:37with another filter01:38giving us many filter maps at the end01:42which are all sent into the next layer01:44as input to produce01:45again many other feature maps until it01:48reaches the end of the network with01:50extremely detailed general information01:53about what the image contains there are01:56many filters and the numbers inside01:58these filters are called the weights02:00which are the parameters trained during02:02our training phase02:04of course the network is not only02:05composed of convolutions02:08in order to learn we also need to add an02:10activation function02:11and a pooling layer between each02:13convolution layer02:15basically these activation functions02:17make possible the use of the back02:19propagation technique02:21which basically calculates the error02:23between our guess02:24and the real answer we were supposed to02:26have02:27then propagating this error throughout02:29the network02:30changing the weights of the filters02:32based on this error02:34once the propagated error reaches the02:36first layer another example is fed to02:38the network02:39and the whole learning process is02:41repeated thus iteratively improving our02:44algorithm02:45this activation function is responsible02:48for determining02:49the output of each convolution02:51computation and reducing the complexity02:53of our network02:55the most popular activation function is02:57called the real u02:58function which stands for rectified03:00linear03:01unit it puts to zero any negative03:04results which are known to be harmful to03:06the network03:07and keeps positive values the same03:10having all these zeros make the network03:12much more efficient to train in03:14computation time03:16since a multiplication with zero will03:18always equal03:19zero then again to simplify our network03:22and reduce the numbers of parameters03:24we have the pooling layers typically03:27we use a two by two pixels window and03:30take the maximum value of this window to03:32make the first pixel of our feature map03:35this is known as max pooling then we03:38repeat this process for the whole03:39feature map03:40which will reduce the x y dimensions of03:43the feature map03:44thus reducing the number of parameters03:46in the network the deeper we get into it03:48this is all done while keeping the most03:51important information03:53these three layers convolution03:55activation and pooling layers can be03:57repeated multiple times in a network03:59which we call our conf layers making the04:02network04:03deeper and deeper finally there are the04:06fully connected layers that learn a04:08non-linear function04:09from the last pooling layer's outputs it04:12flattens the multi-dimensional04:14volume that is resulted from the pooling04:16layers into a one-dimensional vector04:18with the same amount of total parameters04:21then we use this vector in a small fully04:24connected neural network04:25with one or more layers for image04:28classification04:29or other purposes resulting in one04:31output per image04:33such as the class of the object of04:36course04:36this is the most basic form of04:38convolutional neural networks04:40there have been many different04:42convolutional architectures04:44since lenet5 by jan lacun in 199804:47and more recently with the first deep04:49learning network04:50applied in the most popular object04:52recognition competition04:54with the progress of the gpus the alex04:57net network in 201204:59this competition is the imagenet05:01large-scale visual recognition05:03competition05:04rls vrc where the best object detection05:07algorithms were competing every year05:10on the biggest computer vision data set05:12ever created05:13imagenet it exploded right after this05:16year05:17where new architectures were beating the05:19precedent one05:20and always performing better until today05:23nowadays most state-of-the-art05:25architectures perform05:26similarly and have some specific use05:29cases05:29where they are better you can see here a05:32quick comparison of the most used05:34architectures in 202005:37this is why i will only cover my05:40favorite network in this video which is05:42the one that yields the best results in05:44my researches05:45densenet it is also the most interesting05:48and promising cnn architecture in my05:50opinion please let me know in the05:53comments if you would like me to cover05:55any other type of network architecture05:58the densenet family first appeared in06:00201606:01in the paper called densely connected06:03convolutional06:04networks by facebook ai research06:07it is a family because it has many06:10versions06:11with different depth ranging from 12106:14layers06:15with 0.8 million parameters06:18up to a version with 26406:22layers with 15.3 million parameters06:26which is smaller than the 101 layers06:28deep06:29resnet architecture as you can see here06:32the densnet architecture uses the same06:34concepts of convolutions06:35pooling and the relu activation function06:38to work06:39the important detail and innovation in06:41this network architecture06:42are the dense blocks here is an example06:45of a five-layer dense block in these06:48dense blocks06:49each layer takes all the preceding06:51feature maps as input06:53thus helping the training process by06:56alleviating the vanishing gradient06:58problem06:59this vanishing gradient problem appears07:01in really deep07:02networks where they are so deep that07:04when we back propagate the error into07:06the network07:07this error is reduced at every step and07:10eventually becomes07:11zero these connections basically allow07:14the error to be propagated07:16further without being reduced too much07:19these connections also encourage feature07:21reuse and reduce the numbers of07:23parameters07:24for the same reason since it's reusing07:27previous feature maps information07:29instead of generating more parameters07:31and therefore07:32accessing the network's collective07:34knowledge and reducing the chance of07:36overfitting07:37due to this reduction in total07:39parameters07:40and as i said this works extremely well07:43reducing the number of parameters by07:45around 5 times compared to a07:46state-of-the-art resnet architecture07:48with the same number of layers07:50the original dense net family is07:52composed of four dense blocks07:55with transition layers which do07:57convolution07:58and pooling as well and a final08:00classification layer if we are working08:02on an image classification task08:04such as the rls vrc competition08:08the size of the dense block is the only08:10thing changing for08:12each version of the densenet family to08:14make the network08:15deeper of course this was just an08:18introduction to the convolutional08:19neural networks and more precisely the08:22dense net architecture08:23i strongly invite you to further read08:25about these architectures if you want to08:27make a well thought choice for your08:29application08:30the paper and github links for densenet08:32are in the description of the video08:34please let me know if you would like me08:36to cover any other architecture08:39please leave a like if you went this far08:41in the video08:42and since there are over 90 of you guys08:44watching that are not subscribed yet08:46consider subscribing to the channel to08:48not miss any further news clearly08:50explained08:51thank you for watching08:55[Music]