mirror of
https://github.com/2martens/uni.git
synced 2026-05-06 11:26:25 +02:00
[Masterproj] Polished paper review
Signed-off-by: Jim Martens <github@2martens.de>
This commit is contained in:
@ -79,7 +79,7 @@ maxnames=2
|
|||||||
|
|
||||||
\begin{document}
|
\begin{document}
|
||||||
|
|
||||||
\title{Master project: seminar report template}
|
\title{Deep Sliding Shapes: A Review}
|
||||||
\author{Jim Martens}
|
\author{Jim Martens}
|
||||||
|
|
||||||
\maketitle
|
\maketitle
|
||||||
@ -91,9 +91,11 @@ recognition network to find the actual objects. In the end it produces 3D
|
|||||||
bounding boxes and outperforms 3D selective search and other state-of-the-art
|
bounding boxes and outperforms 3D selective search and other state-of-the-art
|
||||||
solutions.
|
solutions.
|
||||||
|
|
||||||
The paper is presenting the approach in an understandable manner. But the
|
The introduced approach has a remarkable high-level structure that is
|
||||||
reproducibility of Deep Sliding Shapes is suboptimal as key information for
|
used in more recent networks as well. But the code implementation and the
|
||||||
such an endeavour is missing from the paper.
|
provided implementation details or the lack thereof makes an independent
|
||||||
|
reproduction of the results and an adoption for other problems very difficult
|
||||||
|
if not impossible.
|
||||||
|
|
||||||
|
|
||||||
% Lists:
|
% Lists:
|
||||||
@ -360,10 +362,23 @@ Overall the paper provides many illustrating figures that make it far easier
|
|||||||
to imagine the results of the introduced method and quite simply hydrate the
|
to imagine the results of the introduced method and quite simply hydrate the
|
||||||
paper and make it friendlier to the eyes compared to an all text paper.
|
paper and make it friendlier to the eyes compared to an all text paper.
|
||||||
|
|
||||||
Lastly the paper provides many evaluation results that are understandable
|
Furthermore the paper provides many evaluation results that are understandable
|
||||||
largely without the main paper text and give a good overview over the performance
|
largely without the main paper text and give a good overview over the performance
|
||||||
of the proposed method compared to others.
|
of the proposed method compared to others.
|
||||||
|
|
||||||
|
Aside from the paper writing skills the authors clearly posess, the presented
|
||||||
|
approach itself is also very good. It is an elegant idea to first reduce the
|
||||||
|
search volume by applying a region proposal network and then use an object recognition
|
||||||
|
network to do the heavy lifting. The usage of the 2D data is well thought of
|
||||||
|
as well. This abstract idea of dealing with 3D data has persisted and is somewhat
|
||||||
|
repeated by the Frustum Pointnet\cite{Qi2017}, which uses the results of a 2D
|
||||||
|
object detection network to determine the region in which the 3D object detection
|
||||||
|
takes place. The object detection network not only provides the region in form
|
||||||
|
of bounding boxes but also the classification of the detected objects in form
|
||||||
|
of a k vector. Though the specific implementation varies greatly the abstract
|
||||||
|
idea of region proposal, usage of 2D data and object detection/recognition at
|
||||||
|
the end is visible in both Deep Sliding Shapes and the Frustum Pointnet.
|
||||||
|
|
||||||
% subsection positive_aspect (end)
|
% subsection positive_aspect (end)
|
||||||
|
|
||||||
\subsection{Paper Weaknesses} % (fold)
|
\subsection{Paper Weaknesses} % (fold)
|
||||||
@ -371,26 +386,17 @@ of the proposed method compared to others.
|
|||||||
|
|
||||||
That said there are things to criticize about this paper. The information about
|
That said there are things to criticize about this paper. The information about
|
||||||
the network structure is spread over two figures and some sections of the paper
|
the network structure is spread over two figures and some sections of the paper
|
||||||
with no guarantees that no information is missing. Furthermore no information
|
with no guarantees that no information is missing. The evaluation sections are
|
||||||
regarding the training, validation and testing data split were available. While
|
inconsistent in their structure. The first section about object proposal evaluation
|
||||||
this implementation information does not have to be inside the paper proper it
|
follows the rest of the paper and is written in continuous text. It describes the
|
||||||
should have been inside appendices to make an independent replication of results
|
compared methods and then discusses the results. The second section regarding the
|
||||||
easier. Not directly a problem with the paper itself the decision to implement
|
object detecion evaluation however is written completely different. There is no
|
||||||
a software framework from scratch rather than using a proven existing one like
|
continuous text and the compared methods are not really described. Instead the
|
||||||
Tensorflow makes it more difficult to utilize the pretrained models which are
|
section is largely used to justify the chosen design. This would not even be a
|
||||||
indeed available.
|
problem if there were a introductory text explaining their motivations for this
|
||||||
|
kind of evaluation and guiding the reader through the process. Currently there
|
||||||
The evaluation sections are inconsistent in their structure. The first section
|
is no explanation given why the detection evaluation starts with feature encoding
|
||||||
about object proposal evaluation follows the rest of the paper and is written
|
and is followed by design justification.
|
||||||
in continuous text. It describes the compared methods and then discusses the
|
|
||||||
results. The second section regarding the object detecion evaluation however
|
|
||||||
is written completely different. There is no continuous text and the compared
|
|
||||||
methods are not really described. Instead the section is largely used to justify
|
|
||||||
the chosen design. This would not even be a problem if there were a introductory
|
|
||||||
text explaining their motivations for this kind of evaluation and guiding the
|
|
||||||
reader through the process. Currently there is no explanation given why
|
|
||||||
the detection evaluation starts with feature encoding and is followed by
|
|
||||||
design justification.
|
|
||||||
|
|
||||||
Furthermore the motivations for the used data sets NYUv2 and SUN RGB-D are
|
Furthermore the motivations for the used data sets NYUv2 and SUN RGB-D are
|
||||||
not quite clear. Which data set is used for what purpose and why? The text
|
not quite clear. Which data set is used for what purpose and why? The text
|
||||||
@ -398,6 +404,16 @@ mentions in one sentence that the amodal bounding boxes are obtained from
|
|||||||
SUN RGB-D without further explanation. It would have been advantageous
|
SUN RGB-D without further explanation. It would have been advantageous
|
||||||
if the actual process of this "obtaining" was explained.
|
if the actual process of this "obtaining" was explained.
|
||||||
|
|
||||||
|
Lastly no information regarding the training, validation and testing data split were
|
||||||
|
available. While this implementation information does not have to be inside the
|
||||||
|
paper proper it should have been at least inside appendices to make an independent
|
||||||
|
replication of results possible. Not directly a problem with the paper itself the decision to
|
||||||
|
implement a software framework from scratch (Marvin framework) rather than using
|
||||||
|
a proven existing one like Tensorflow makes it more difficult to utilize the
|
||||||
|
pretrained models which are indeed available and more importantly to adapt Deep
|
||||||
|
Sliding Shapes to other data sets and problems. To top it all of, the available
|
||||||
|
Matlab "glue" code is not well documented.
|
||||||
|
|
||||||
% subsection negitive (end)
|
% subsection negitive (end)
|
||||||
|
|
||||||
% section review (end)
|
% section review (end)
|
||||||
@ -410,8 +426,17 @@ network and a joint 2D and 3D object recognitioin network. Experimental
|
|||||||
results show that this approach delivers better results than previous
|
results show that this approach delivers better results than previous
|
||||||
state-of-the-art methods.
|
state-of-the-art methods.
|
||||||
|
|
||||||
|
The proposed approach introduced an important general structure for networks
|
||||||
|
working with 3D data and is roughly and on a high-level visible in more recent
|
||||||
|
network utilizing 3D data as well. In the practical sphere the custom code
|
||||||
|
framework and the badly documented code makes it very difficult to replicate the
|
||||||
|
results independently or even adapt Deep Sliding Shapes to other problems.
|
||||||
|
In short: Good theory, bad practical implementation.
|
||||||
|
|
||||||
In future work this method should be compared to other 3D centric object detection
|
In future work this method should be compared to other 3D centric object detection
|
||||||
approaches like Frustum Point Net\cite{Qi2017}.
|
approaches like Frustum Point Net\cite{Qi2017}. Especially a structural comparison
|
||||||
|
with other 3D approaches is interesting to see if there is a best practice structure
|
||||||
|
emerging for the handling of 3D data.
|
||||||
|
|
||||||
\newpage
|
\newpage
|
||||||
\printbibliography
|
\printbibliography
|
||||||
|
|||||||
Reference in New Issue
Block a user