-
http://techbrahmana.blogspot.com/2008/04/pragma-commentlib-cool-way-to-indicate.html
-
Ideas on Surgery Video - [Research]
2008-04-30
1. Introduce discrimination into GPDM (D-GPDM), the classification is based on CRF
2. Introduce discrimination into GPDM (D-GPDM), the classification is take care of by based on MIL. The tem In MIL, we use the learned latent point as the feature,
and we learn a set of instance prototypes. The MIL learning is based on Diverse Density.3. Using GPDM to learn the latent space trajectory, then classification is done by matching trajectory in the latent space via method in [1]
4. Introduce discrination into GPDM, the classification is taken care of by Gaussian Process classification. But it is hard to really innovate on the
Gaussian Process Classification part. The GPC inference involves approximation approaches like (loopy) belief propagation, expectation propagation,
variational approximations, or Monte Carlo Sampling.[1] M. Black and A. Jepson. A probabilistic framework for matching temporal trajectories:Condensation-based recognition of
gestures and expressions. In ECCV, 1998. -
I am reading dozens of papers in order to do my surgery video project. Now I have the ideas, and I only have two months before my graduation to fully develop those ideas including coding, running experiment, analyze the data and publish paper if possible. Too short to work on, so I really need to manage my time efficiently.
The papers that I have been reading
1. Conditional random field
1.1 Lafferty, J., Zhu, X. and Liu, Y. 2004. Kernel conditional random fields: Representation and clique selection. In Proc. Twenty-First International Conference on Machine Learning (ICML).
1.2 Sanjiv Kumar and Martial Hebert, Discriminative random field, Volume 68, Number 2 / June, 2006, pp. 179-201.
1.3 C. Sutton, A. McCallum, An introduction to Conditional random fields for relational learning, Introduction to Statistical Relational Learning, Ed. Lise Getoor and Ben Taskar, the MIT press.
1.4 C. Sminchisescu, A. Kanaujia, Z. Li, D. Metaxas, Conditional models for contextual human motion recognition, in: IEEE International Conference on Computer Vision, Vol. 2, 2005, pp. 1808--1815.
1.5 B. Taskar, C. Guestrin, and D. Koller. Max-margin markov networks. In Sebastian Thrun, Lawrence Saul, and Bernhard Sch olkopf, editors, Advances in Neural Information Processing Systems 16, 2003.
1.6 Ariadna Quattoni, Michael Collins and Trevor Darrel. Conditional Random Fields for Object Recognition. In Advances in Neural Information Processing Systems 17 (NIPS 2004), 2004.
2 Gaussian Process
2.1 Gaussian Process Regression, Chapter 2, Machine Learning for Pattern Recognition, M. Bishop, MIT Press.
2.2 Gaussian Process Classification, Chapter 3, Machine Learning for Pattern Recognition, M. Bishop, MIT Press.
2.3 N. Lawrence, Probabilistic Non-linear Principle Component Analysis with Gaussian Process Latent Variable Models, Journal of Machine Learning Research 6, pp. 1783-1816, 2005.
2.4 Wang, J. M., Fleet, D. J., Hertzmann, A. Gaussian Process Dynamical Models for Human Motion. In IEEE Trans. PAMI. February, 2008. pp. 283-298.
2.5 Yasemin Altun, Thomas Hofmann and Alexander J. Smola. Gaussian process classification for segmenting and annotating sequences. In Proceedings of the Twenty-First International Conference on Machine Learning (ICML 2004), 2004.
2.6 Y. Altun, T. Hofmann, and M. Johnson. Discriminative learning for label sequences via boosting. In Advances in Neural Information Processing Systems (NIPS*15), 2003.
2.7 Urtasun and T. Darrell. Discriminative Gaussian process latent variable model for classification. In 24th International Conference on Machine Learning, 2007.
3, Combine multiple data source
3.1 M. Girolami, M. Zhong, Data Integration for classification problems employing gaussian process priors, NIPS 2006.
-
1. No need to construct the latent space
2. No need to learn the latent to ambient and/or ambient to latent mapping
3. The learning of the correlation coefficients only need very few parameters. GPLVM, GPDM, Spectrial LVM need to tune lots of parameters
4. The training for obtaining the correlation coefficients are much faster than GPLVM and GPDM.
5. In laten variable models, we need to assume the parametric form of the probability distribution of the ambient variables given the latent variables. But in our method we do not need that assumption and thus is more general
6. As long as there are structural relationships among the state variables, the PLS is able to discover it.
Some disadvantage of GPLVM and GPDM
"GPLVM’s lack of latent space prior makes it somewhat agnostic in visual inference applications, where it is useful to penalize drifts from the manifold of typical configurations. "----- Kanaujia, ICCV 2007.
"Urtasun et al [22] use GPLVM to track walking based on image tracks of the human joints obtained using the WSL tracker of Jepson’s et al. For more expressive kinematic representations, and in order to compensate for GPLVM’s lack of latent space prior, the authors [22] use an augmented, constrained (latent, ambient) state for tracking. This is feasible but, once again, renders the state estimation problem high-dimensional." ----- Kanaujia, ICCV 2007.references;
[22] R. Urtasun, D. Fleet, A. Hertzmann, and P. Fua. Priors for people tracking in small training sets. In ICCV, 2005.
-
I just tried using kernel pLS to extract the nonlinearies in the 3D human motion data, and then used the extracted coefficients matrix for prediction in the RBPF tracking framework. However I found that the tracking error is very high. The reason, I am thinking, could be:
1. The kernel gram matrix bias towards the testing points that are similar to the training points. That is only if the training samples are similar to the testing points, the learned coefficients matrix is useful.
In order to verify the idea, I need to
1) Do predictions using the coefficients learned from kernel pLS, without integrating into the tracking framework. Try to analyze the predictions errors obtained from different training data points.
2) Try to use training motion cycles that are 'similar' to the testing motions. That only works if we have the testing motions available.
Some unresolbed questions:
1)Do I need to make the data zero-mean or not? I tried on two data sets, and I found that non-centralized kernel PLS is better, though still worse than PLS.
2) how to determine Kernel parameter? now the sigmma (RBF kernel parameter in the kernel PLS) is determined by cross validatation.
-
RBPF-PLS: Future work for jounal publication - [Research]
2007-09-04
1. Compare with PCA in terms of latent space mapping
2. What is the advantage of our method? Using examples to illustrate 'smart sampling'
3. Partition the sample space in upper and lower body to see the performance.
4. Try a more sophisticated dynamical model, at least a second order markov model. "Your dynamical model is very simple, just a 1st order markov model. Why don't you use something more complicated, or at least a second order markov model?"
5. Using kernel to capture nonlinearties in the motion space. "Why are you using a linear relationship between left and right side of the body? "
6. Analyze the error cases. "I also concern about the error cases shown in the vidoes. They should have been analyzed and discussed. "
-
Today, Professor Gang Qian called me and discussed with me several questions regarding my work on 3D articulated tracking. He has several good points. I think the discussion with him also stimulate me to think deeply about the advantage of PLS-RBPF and why it can be called RBPF inference.He pointed the following points:1. The Kalman prediction using correlation B effectively restrict the samples only to be draw from the temproal dynamics that conforms well to human motion, eliminating samples that does not conform to human motion structure. 2. Dr. Qian thinks that until weighting each sample by the image likilihood, we are not in the RBPF tracking framework, because he thinks that until the "weighting" step, no true measurement is incorporated. I have convinced him that we are in the RBPF tracking framework because of the Kalman Update step which incorporate the true measurement through the auxiliry measurement Ot-1 that is the leaf mean state at the previous time step.3. Dr. Qian thinks that our movement modeling method of learning the correlation using PLS regression is more general in the sense that as long as there are correlations in the joint angle data, we can then learn this from the training data and apply it to tracking. Thus our method is able to track other kinds of motions like jogging, running or even dancing motion sequence.
-
These are some venues for my future work on 3D human motion tracking using PLS-RBPF
Improve the tracking performance by improving the algorithm from the following aspects:
1. More accurate and robust background subtraction method.
2.More powerful dynamic model. Reference:learning and classification of complex dynamics by North PAMI 2000.
3. The parameters of the Kalman filter, including A, H or the noise covariance matrix, can be learned in someway. Reference:Qiang Wang, Learning object intrinsic structure for robust visual tracking. CVPR 2003.
4.Good image measurements such as optical flow. Reference:T.B. Moeslund and E. Granum. Multiple cues used in model-based human motion capture. In International Conference on Face and Gesture Recognition, 2000. S. X. Ju, M. J. Black, and Y. Yacoob. Cardboard people: A parameterized model of articulated image motion. In Intl. Conf. on
Automatic Face and Gesture Recognition, pages 38–44, 1996.5. Incoporate real measurement into the Kalman filter update. But how to? Reference:S. Wachter and H.-H. Nagel. Tracking persons in monocular image sequences. Computer Vision and Image Understanding, 74(3):174–192, June 1999.
6. Incoporate temporal information into the learned correlation model. ReferenceI. Kakadiaris and D. Metaxas. Model-based estimation of 3D human motion. IEEE PAMI, 22(12):1453–1459, December 2000.
7. Try Kernel PLS to capture the non-linear dynamics between left- and right-side motions.
The current 3D tracking work can be improved from the following aspects:
1.The interpretation of the PLS model including the loadings, the LVs and the Bipmap.
2.How significant the learned correlation is? This can be done by visualizing the predicted left-side joint angles and the grond truth right side angles.
3.Analyze the performance of the tracker to the (i) amount of training data.(ii) the variance captured by the PLS. In addition, answer " what criteria measures the goodness of the learned correlation model? Is it the variance captured by the PLS regression? IF not, then what?"
4. Put examplar images in the 3D latent-space visualization of the correlation model.
5. The presentation of the tracking algorithm in the ICCV paper Sec.4 could be improved. See Qiang Wang CVPR 2003
-
Two questions regarding dimensionality reduction technique LLE and Isomap and Laplacian Eigenmaps
1. Does it exist an inverse mapping from the embedding space to the original high-dimensional state space?
2. What is the difference between these three methods?
Two papers applied dimensionality reduction to visual tracking
Qiang Wang et al. Learning object intrinsic structure for robust visual tracking. CVPR 2003
Almed Elgammal et al. Inferring 3D pose from silhouettes using activity manifold learning.
-
The anatomy of color profile
2007-04-23
The anatomy of a color profile
ICC profiles consist of a header and and a set of tags, which contain the bulk of the data. You can examine the contents of profiles with ICC Profile Inspector, which you can download from the ICC Resource Center by clicking on CONTINUE and following the instructions. When you run it, click Browse... to load the profile. The header information and tag table are displayed. Double-click on a tag to see its contents. A typical tag, gXYZ (green primary color), is illustrated on the right.
There are three classes of profile, indicated in the Device Class field of the header: Input ('scnr'), Display ('mntr'), and Output ('prtr'). Each has a set of required tags and a set of optional tags. Display profiles are used to define color spaces. Many monitor profiles contain the ://developer.apple.com/techpubs/mac/ACI2.5/WhatsNewColorSync25.6.html">vcgt tag for setting lookup tables when a loader program is run, i.e., for calibrating the monitor.
The meaning of the tags is specified in the formidable (126 page) ICC File Format for Color Profiles (Version 4.0.0), which is rich in content and readable if you skip the bureaucratic parts (strong coffee recommended). In most cases the meaning will be obvious from the ICC Profile Inspector display. The basic tag signatures (4-character abbreviations) are
- desc is the description of the profile, used in PW Pro selection boxes.
- rXYZ, gXYZ and bXYZ specify the R, G, and B primaries that determine the gamut of the color space or device.
wtpt is the white point. The two standard white points are 6500K (D65): X=0.95045, Y=1.0, Z = 1.08905, and 5000K (D50): X=0.96429, Y=1.0, Z=0.82510. Y is always 1.0 and Z varies the most. Used for absolute colorimetric gamut mapping, which is of little interest to photographers.
- rTRC, gTRC and bTRC are the R, G and B Tone Reproduction Curves that define device or color space gamma in Input and Monitor profiles. An example (gTRC) is shown on the right. Gamma is often indicated on the upper right.
-
- gamma = -ln(y5)/0.69315
If it isn't, it can be calculated from - AToBn or BToAn are gamut mapping tables used in printer profiles. A refers to the device; B refers to the profile connection space (PCS); n = 0 for perceptual, 1 for colorimetric or 2 for saturation rendering intent. BToAn tags are used for printing; AToBn are used for proofing (previewing the print). These tables are large. Profiles that contain them can be several hundred kilobytes-- sometimes over a megabyte. TRC tags are omitted in printer profiles. All the printer profiles I've examined have gamt (out-of-gamut) tags, but little information is available about them. The best way to examine the actual performance of printer profiles is with Gamutvision.
.
where x' = x/xmax , y' = y/ymax , and y5 = y' at x'=0.5 (the middle of the x-axis) = 0.22 (in the curve on the right). Typical values: y5 = 0.287 for gamma = 1.8, 0.25 for gamma = 2.0; 0.218 for gamma = 2.2. [ This equation can be easily derived from (y/ymax) = (x/xmax)gamma ].
Manufacturer-private tags make it difficult to figure out what a profile is supposed to do. An example is vcgt (Video card gamma tag, registered to Apple), widely used in monitor profiles (Adobe, Monaco, Praxisoft, etc.) to set video card LUT's. It is not to be found in the ICC specification. Another example: Mtbx, in the monitor profiles created by Adobe Gamma and MonacoEZcolor. Try searching Google and you'll find pages on mountain bikes. Here is what the spec says (p. 3): "Private data tags allow CMM developers to add proprietary value to their profiles. By registering just the tag signature and tag type signature, developers are assured of maintaining their proprietary advantages while maintaining compatibility with this specification. However, the overall philosophy of this format is to maintain an open, cross-platform standard, therefore the use of private tags should be kept to an absolute minimum." And that is how things would be in the best of all possible worlds.
Of course a profile's actual performance is more than the sum of its parts. To see how a profile functions with different rendering intents under a variety of conditions, you'll need Gamutvision.
-
Notes on color management
2007-04-23
The following are from http://www.normankoren.com/color_management_2.html
1 ICC Profiles consist primarily of tables that relate numeric data, for example, RGB (222,34,12), to colors expressed in a device-independent CIE color space called a Profile Connection Space (PCS)-- either CIE-XYZ or CIELAB.
2 Monitor profiles have the same format as color space profiles. Profiles may contain additional data, such as a preferred rendering intent and gamma, Monitor profiles often contain instructions for loading video card lookup tables, i.e., for calibrating the monitor.
3Color management has two key features:
- ICC profiles: files that define the meaning of numeric color image data, i.e., RGB = (17, 44, 227). Profiles can define color in devices (scanners, digital cameras, printers, etc.) or image color spaces. All digital images refer to a color space-- either explicitly via an embedded or user-specified profile, or implicitly: Windows assumes sRGB color space is none is specified.
- Gamut mapping: The transformation that takes place when an image is transferred between formats or devices, for example,
- from one color space to another.
- from an image in memory to a monitor.
- from an image in memory to a printer.
Configuration settings using Picture Window Pro
Color Management Settings for Picture Window Pro
.. Settings that require special attention are highlighted in red; recommendations are in violet.Box Settings Recommendations and comments Color Management: Disabled or Enabled Enabled turns on color management. Color Engine: Windows Default (ICM 2.0) or lcms (Little CMS) Windows Default. Lcms probably works equally well. Working Color Space: Choice of profiles*or None (The actual working color space of an image can be determined by right-clicking on the image, clicking on Display Info and observing the Color Profile setting.)
Used differently from Photoshop. See note below.
Specifies the working the color space to convert images when they are opened, if you so choose. (See On Profile Mismatch.) Has no immediate effect on image appearance. A key user decision. sRGB is the simplest and most compatible with monitors and the Internet, but a Medium-gamut color space such as Adobe RGB (1998) (identical to SMPTE-240M) is recommended when the primary output is high quality prints. Wide gamut spaces present some problems. The ICC profile of the working color space will be embedded in the image when it is saved. See Working color space, below. Assumed File Profile: Choice of profiles* or None The assumed color space of an image file that has no embedded profile. sRGB (Windows/Internet default) is usually the best choice. The default, None, implies sRGB. On Profile Mismatch: Ask/Convert/Don't Convert Ask is the best choice to start out with. The dialog box will ask for the rendering intent. (Don't Ask/Don’t Tell was omitted.) Assumed Scanner Profile: Choice of scanner profiles* or None I recommend NONE if the profile can be selected in the scanner driver. Otherwise select the appropriate profile here. It should not be selected in both; the profile would be applied twice. If you set the scanner profile here, the imported file will contain original scanner data with the scanner profile embedded. You may be asked if you want to convert it into your working color space. Selecting the profile in the scanner driver software reduces the chance of error; the results are identical. Monitor Profile: Choice of profiles* or None You should use the profile created by your monitor profiling program. It's a good idea to check it with ICC Profile Inspector to be sure the TRC (tone response curve) tags agree with the value of gamma set during calibration (usually 2.2). If your monitor profile has a different gamma, use sRGB IEC61966-2.1, which is an essentially neutral profile with gamma = 2.2 and R, G, and B primaries close to typical CRT monitors. If None is set, images are sent to the monitor without gamut mapping. This can result in significant errors for working spaces other than sRGB. See Monitor profiling. Monitor Rendering Intent: One of the four rendering intents Maintain Full Gamut (Perceptual) is the best choice in most cases. Proofing Profile: Choice of printer profiles*
or NoneNone, most of the time. Used mainly by the printing industry to preview low gamut CMYK printing press output on the monitor. To preview a printer/ink/paper combination, select the appropriate profile. May not work well with high quality inkjet printers. Compare it with your printed output to see if it works for you. Proofing Rendering Intent One of the four rendering intents Preserve Identical Colors and White Point (Relative colorimetric) or Maintain Full Gamut (Perceptual). Try both; the difference can be significant with large gamut working color spaces. Inactive when Proofing Profile is set to None. Monitor Calibration (Removed from current versions of PW Pro. It set the LUT, overriding other calibration settings.) Disabled
-
Human motion tracking using HumanEva data - [Research]
2007-03-13
The mocap data from HumanEva records the position, I need to compute the global and local transform from the marker data and then compute the body angles from the transform.
First try to project the 3D poses to 2D and see how the pose fit the image data using trial one of any subject (S1, S2, S3 or S4);
If it fits well, then use three sets of data(S1_walking_trial3,S2_walking_trial3,S3_walking_trial3) to learn motion prior.
After that, track the moving person in sequece S1_walking_trial1,S2_walking_trial1 and S3_walking_trial1), and do evaluation by comparing the estimated poses with the ground truth MoCap data. I can also track the moving person in sequences S1_walking_trial2,S2_walking_trial2 and S2_walking_trial2, but since no motion capture data is given, I can not do any further evaluation, and I can only do visual inspection to see the performance of the algorithm.
-
Upcoming tasks on color management - [Research]
2007-03-03
What have been done:
1. Given an image, using the little CMS C++ API, I have been able to apply differnt output color profiles to the same image and show the appearance of the same image side by side after different ouput color profile transformation.
What is the next step:
2. After a discussion with my advisor, I am going to achieve the following objective:
Given an image I shown on LCD A, try to find the image J so that the appearance of J in a desktop monitor looks the same with the appearance of I on A. Basically I would like to find an approapriate output color profile (xxx.icc)so that after I apply the ICC transfom (J=icctrans(I,'-t0 -o xxxx.icc'), image J looks the same on monitor with the appearance of I on a LCD.
But then fundamental questions arise:
the purpose of color management is to make the same image look consistently the same under different device, but why when I display the same image on one LCD and one CRT monitor, the appearance of the same image is different? And what is the color profile used in the working color space, namely the PCS space, and also what is the color profile for the LCD (or a CRT). Is the profile used for the output device (like LCD or CRT) the same with the one used in the working color space?
-
The color management
2007-03-01
I am reading color management materials recently. I found that it is really a big topic. It's not easy to grasp all the fundamental concepts in color science and color management in a short period of time.
Regarding my project "Color management in ophthalmology images, I have some questions:
1. What are the problems we want to addressf? Specifically, what benifit we want to get from color management of the retina images, and what is our objective?
2. We may consider set up a new standard regarding what kind of working color space is most approapriate for displaying retina images (this is like sRGB is the default working color space for the Web, Adobe RGB (1998) is widely recommended when the primary output is high quality inkjet printers), and what is the approapriate gamut mapping intent for the ophthalmology images?
-
Call matlab from C++ using matlab compiler - [Research]
2007-02-20
The steps in using matlab compiler to call matlab code from C++ enviornment.
1. in Matlab prompt, type
>>mbuild -setup to configure the C++ compiler
2 in Matlab prompt, type
>>mcc -W cpplib:libIMclassify -T link:lib addshape.m colorf.m fea4blocks.m
This will create a shared library from an arbitrary set of M-files (or mex-file). The MATLAB Compiler generates a wrapper file, a libIMclassify.h, a libIMclassify.dll file, a libIMclassify.lib file, a libIMclassify.ctf file. What will used in the C++ application include the .dll, .lib, .h and .ctf file and other matlab files (e.g. .mat or .dll or that the matlab .m files depend on).
3 Put the the .dll, .lib, .h and .ctf file in the folder of the C++ application
4 Write a C++ wrapper that calls the matlab compiler generated shared library
In my image categorization project, I wrote the following C++ wrapper named main.cpp
#ifdef __APPLE_CC__
#include
#include
#endif#include "libIMclassify.h"
using namespace std;#ifdef _MSC_VER
extern "C"
#endifvoid* ClassifyIM(const char** inIMfn, const int &nIM, void *x)
{
int *err = (int *)x;
if (err == NULL) return 0;// Call application and library initialization. Perform this
// initialization before calling any API functions or
// Compiler-generated libraries.
if (!mclInitializeApplication(NULL,0))
{
std::cerr << "could not initialize the application properly" << std::endl;
*err = -1;
return x;
}
if( !libIMclassifyInitialize() )
{
std::cerr << "could not initialize the library properly"<< std::endl;
*err = -1;
}
else
{
try
{
// Create the input data
// Input parameters:
// B: test image file name cell array and
// nIM: number of test images to be classifile
int arr[1] = {nIM};
mwArray N(1,1,mxINT8_CLASS,mxREAL);
N.SetData(arr, 1);
mwArray B(1, nIM, mxCELL_CLASS);//the image name in cell array format
for(int i = 1; i<= nIM; i++)
{
mwArray fn(inIMfn[i-1]);
B.Get(1,i).Set(fn);
}
//Create output parameters
//lbl: the predicted label, an array of size nIM
mwArray lbl(nIM,1,mxUINT8_CLASS,mxREAL);//Call the library function
ImageCategorization(1, lbl, B, N);//convert the predicted label from mwArray in matlab to unsigned int array in C++
unsigned int *lblp = new unsigned int[nIM];
lbl.GetData(lblp,nIM);//Display the returned predicted label for each image
for(int i = 0; i < nIM; i++)
std::cout << "Label of " << inIMfn[i] << " is: " << lblp[i]< }
catch (const mwException& e)
{
std::cerr << e.what() << std::endl;
*err = -2;
}
catch (...)
{
std::cerr << "Unexpected error thrown" << std::endl;
*err = -3;
}
// Call the library termination routine
libIMclassifyTerminate();
}
/* On MAC, you need to call mclSetExitCode with the appropriate exit status
* Also, note that you should call mclTerminate application in the end of
* your application. mclTerminateApplication terminates the entire
* application and exits with the exit code set using mclSetExitCode. Note
* that this behavior is only on MAC platform.
*/
#ifdef __APPLE_CC__
mclSetExitCode(*err);
#endif
mclTerminateApplication();
return 0;
}int main()
{
int err = 0;
int nIM = 3;
const char* imfn[] = {"test001.jpg", "test2.jpg","test0003.jpg"};
#ifdef __APPLE_CC__
pthread_t id;
pthread_create(&id, NULL, run_main, &err);CFRunLoopSourceContext sourceContext;
sourceContext.version = 0;
sourceContext.info = NULL;
sourceContext.retain = NULL;
sourceContext.release = NULL;
sourceContext.copyDescription = NULL;
sourceContext.equal = NULL;
sourceContext.hash = NULL;
sourceContext.schedule = NULL;
sourceContext.cancel = NULL;
sourceContext.perform = NULL;CFRunLoopSourceRef sourceRef = CFRunLoopSourceCreate(NULL, 0, &sourceContext);
CFRunLoopAddSource(CFRunLoopGetCurrent(), sourceRef, kCFRunLoopCommonModes);
CFRunLoopRun();
#else
ClassifyIM(imfn, nIM, &err);
#endif
return err;
}5. Create a win32 console project using Visual Studio C++ 2003, add main.cpp and libIMclassify.h to the project.
6 Set C++ project settings in Visual Studio .NET 2003.
6.1
Click menu Project->Project settings->Linker->General, add C:\Program Files\MATLAB\R2006b\extern\lib\win32\microsoft;F:\RA\TactileOurData\NET\ImageCategorization\ImageCategorization to the "Additional Library Directories. The second directory is the one that holds your C++ project.
6.2 Click Project->Project settings->Linker->Input, add "libdflapack.lib libeng.lib libfixedpoint.lib libmat.lib libmex.lib libmwlapack.lib libmwservices.lib libmx.lib libut.lib mclcom.lib mclcommain.lib mclmcr.lib mclmcrrt.lib mclxlmain.lib libIMclassify.lib" to the "Additional dependencies" entry. The last libIMclassify.lib is the lib file generated by the matlab compiler and is located in the current project directory (F:\RA\TactileOurData\NET\ImageCategorization\ImageCategorization )
7. Build and run.
Standalone matlab application may also be built by using mcc and mbuild command. For example, in matlab prompt or OS prompt, type
mbuild main.cpp libIMclassify.lib
will create an executable named main.exe
-
I am trying to portal my image categorization matlab code into the C++ .NET framework so that Jessie can direcly classify test images in her .NET enviornment. I can rewrite all the code using OpenCV, but that needs quite a lot of time since the system is large, and there are some features (like Daubechies-4 wavelete transform) that OpenCV does not support at all. To reduce the burden in rewriting the code from matlab to C++, my strategy is to call matlab engine from OpenCV so that I can reuse most of the code written in matlab. It should work, but I need to try.
1 Training phase (offline).
Input: a training set file containing the features
Output: model file
Task: rewrite the training matrix into the format that can be used directly by SVM C++ ; and then call the svmtrain in the command line to get the model file.
2 testing phase (in OpenCV, and interact with matlab through matlab c engine)
2.1 Feature extraction for the images jessie passed on to me
a. In OpenCV, convert IplImage into matlab image matrix
b. In OpenCV, call the matlab engine
c. Generate the test cell matrix, test{1,...,n}, test{i}=9X6
2.2 Multi-class feature mapping
a. Input the raw test feature matrix in the format of matlab matrix
b. call the matlab engine
c. produce the multi-class feature matrix.
3. Generate the test file (in txt format) using OpenCV
Input: the multi-class test feature matrix
Output: a txt file containing the predicted label for each test image.
a. convert multi-class feature matrix into txt format
b. call the svmpredict dll to output a testresult.txt containing the predicted labels for each image.
4. Genarate svmtrain and svmpredict dll using .NET
-
On linear programming
2007-01-31
Standard form
- A linear function to be maximized
- e.g. maximize
- Problem constraints of the following form
- e.g.
- Non-negative variables
- e.g.
The problem is usually expressed in matrix form, and then becomes:
- maximize
- subject to

Augmented form (slack form)
Linear programming problems must be converted into augmented form before being solved by the simplex algorithm. This form introduces non-negative slack variables to replace non-equalities with equalities in the constraints. The problem can then be written in the following form:
- Maximize Z in:
where xs are the newly introduced slack variables, and Z is the variable to be maximized.
Example
The example above becomes as follows when converted into augmented form:
maximize 
(objective function) subject to 
(augmented constraint) 
(augmented constraint) 
(augmented constraint) 
where
are (non-negative) slack variables.Which in matrix form becomes:
- Maximize Z in:
Duality
Every linear programming problem, referred to as a primal problem, can be converted into a dual problem, which provides an upper bound to the optimal value of the primal problem. In matrix form, we can express the primal problem as:
- maximize
- subject to
The corresponding dual problem is:
- minimize
- subject to
where y is used instead of x as variable vector.Augmented form (slack form)
Linear programming problems must be converted into augmented form before being solved by the simplex algorithm. This form introduces non-negative slack variables to replace non-equalities with equalities in the constraints. The problem can then be written in the following form:
- Maximize Z in:
where xs are the newly introduced slack variables, and Z is the variable to be maximized.
Example
The example above becomes as follows when converted into augmented form:
maximize 
(objective function) subject to 
(augmented constraint) 
(augmented constraint) 
(augmented constraint) 
where
are (non-negative) slack variables.Which in matrix form becomes:
- Maximize Z in:
-
for each train/test split s
for each possible parameter setting
for each fold
obtain numCat models(w,b) using 1-norm SVM trained on one fold of the training
apply the testing fold to the 5 models;
calculate an accuracty;
store the accuracy
end
end
end
-
Calling C from matlab----some experience
2007-01-31
I am using the GNU glpk toolkit to solve a linear programming problem, but unfortunately the GLPK toolkit is written for C/C++, so I have to write a mex interface so that I can directly call the GLPK from matlab.
Some online resources for writing mex and compiling mex file :
http://www.mathworks.com/access/helpdesk/help/techdoc/matlab_external/
1 A trick to compile multiple mex file including some .lib:
matlab prompt> mex glpkmex.cpp glpk.lib
The mex file after compiking has the extension .mexw32 on matlab version that is higher than 7.3.
2 A trick to debuge mex file through MS studio .NET and matlab
see
http://www.mathworks.com/access/helpdesk/help/techdoc/
then click "External interfaces"-->"Creating C Language MEX-Files"-->"Debugging C Language MEX-Files"
3. We can also try to compile a mex file using MS. NET, see
http://www.mee.tcd.ie/~sigmedia/intranet/coding/HowtoMex.php
4 We can also use gcc to create a mex file, see
-
Preparation for ICML 2007
2007-01-06
1 Add problem definition of multi-class MI classification, and list the tradition binary class classification. Refer to [1] for details.
2 Independently implement the DD-SVM and compare its performance on the SIMPLIcity dataset with MCMI. Use the comparison to emphasize the advantage of MC over binary classification.
3 Add the analysis on sensitivity to number of categories, sensitivity to training sample size, sensitivity to number of negative examples, and sensitivity to label uncertainty
4 Add the comparison to other binary classfication method like MILES, MI-SVM, k-means-SVM, s-SVM
5Explicitly explain when MC may be better than binary MI classification.
6 Draw ROC curve. Refer to [2,3]
[1]An open multiple instance learnig framework and its application in drug activity prediction problems
[2] Supervised versus Multiple instance learning: an empirical comparison
[3] MISSL -
Plan on trying RBPF-PLS on ASU data set
2006-11-05
Only use view 2 and view 3 of the ASU data. The full set of ASU data is located in Through manual fitting, we have calculated a new camera calibration matrix for view 2 and view 3. Using the new calculated camera calibration matrix the first frame (000050.png) can be fitted. But we can not guarantee that the projection of the 3D angles fit the other images using the new camera calibration matrix.The fitted first frame joint angles, the limb length and limb parameters have been stored in The matlab code for manual fitting the parameters of the first frame (including the joint angle vector of the first frame, the limb lengths and limb parameters and the camera calibration matrix) is located in We will test our algorithm using the code and data located in C:\RA\BodyTracking\RBPF-PLS on ASU data. -
step 1: preprocessing;
step 2:organize the training and testing data;
step 3: define features for each category;
step 4:instance generation for each category using the features defined in step 3;
step 5:learn the instance prototype for each category;
step 6:classify the images in the test set
step 7:calculate the confusion matrix, precion, recall and accuracy.
Expected output:
confusion matrix, precision, recall, accuracy; the corresponding major instance for each testing image in each category.
-
Upcoming conference I am inerested
2006-09-29
1 ISBI IEEE International Symposium on Biomedical Arlington, VA Apr 10-16 http://www.biomedicalimaging.org/
2 IEEE International Conference on Multimedia & Expo Beijing, China Jul 205 http://research.microsoft.com/conferences/icme07/
3 IEEE Workshop on Statistical Signal Processing Madison, WI Aug 26-30 http://www.ee.duke.edu/ssp07/
4 IEEE International Workshop on Machine Learning for Signal Processing Thessaloniki, Greece Aug 27-30 http://mlsp2007.teithe.gr/
5 IEEE International Conference on Image Processing San Antonio, TX Sep 16-19 http://www.icip2007.com/
6 ICCV 2007 http://iccv2007.rutgers.edu/ Rio de Janeiro, Brazil, October 14-21, 2007 7 CVPR2007 http://cvpr.cv.ri.cmu.edu/
-
From Jul 10 to Jul 20, Tom and I are busy with capturing some new data. Now we have finished caputuring the video data, and in the following days we are going to do data analysis including:
1 Splitting the video into individual frames and select insecting frames by the four views to serve as the tracking data.
2 Do background substraction to segment the profile of the person out. These will serve as observation likelihood in our tracking algorithm.
3 Manual initialization on the first frame including approximate the limb length and limb parameters, the angle values for the first frame and possibly the camera calibaraition information.
Some real problems/difficulties exist when trying our algorithm on the new data:
1 The limb length and limb scaling parameters are not available.
2 The camera calibration matrix is unavailable and needs to be estimated by ourselves.
3 Numerical training angles are unavailable( some tracking parameters need to be estimated from the training numerical angles).
4Munal initializing the frame could also pose problem if not accurate.
-
Following are some preliminary thoughts on how to innovate on the MA classification using MI, and how to improve the classification accuracy
1. Currently the dimension of the feature space is 13 including color, texture, shape and edge histogram. Consider to include other features that better distinguish MA from other instances like the color (HSV) differnce between the current block and the up, bottom, leaf and right blocks.
2.Using Kernel PCA to reduce the dimensionality of feature vector, leading to more efficient maximum DD point search.
3. A more efficient and effective multidimension density search stratigy that localizes the maximum DD points.
4. From where to start search?
5.Other MI learning methods or framework.
-
My tasks from July 15 to Aug 15 is as follows:
1. Do experiments on new dataset(capture new dataset, manual initialization, background substraction).
2. Think about how to write the ICASSP 3D tracking paper.
3. Think about what else can we do to enhance the NV simulation journal paper. And start doing it.
a) Calculating the fractal dimension is one work;
b)2 pages evaluation by UTMB.
4. Think about what innovations can we do to improve the current MI MA classification. This will possibly lead to another ICASSP paper.
-
I came across some real difficulties when trying PLS_RBPF on new data that I think I can not be easily solved:
1 How to determine the limb length and the 10*4 limb parameters? In the Brown data set, these parameters are determined from the motion captured marker data, but in my own data, I do not have the motion capture system, and can not calculate these parameters precisely. The only way is to approxiate by munual initialization, but that will make the tracking inaccurate.
OR can I try a new human body model that does not involve the limb parameters?
2 We do not have the calibration information for the wireless cameras.
3 Some tracking parameters need to be calculated from the motion captured training angles, we do not have the numerical training angles.
4 Manual initialization for the first frame is also a problem.
-
June 30 meeting minute
2006-07-01
1 By next Thu, submit a 3 page report to Dr.Li regarding the MA classification using MI learning. Containing the implementation details, the results and some thinkings.
2 By next Friday, submit a 1-2 page report to Dr.Li regarding the 3D tracking, what have been done so far, what are the results in video format and what are your thinkings.Implement the Dynamic Time Warping error comparison method and include the results on the report.
3 Develop a utility using MATLAB to manually do the initialization for the 3D PLS_RBPF_25D tracking.
4For the arrangements of my work in the three directions in the next five months:
a)3D tracking, I will do most of the work by myself. I have roughly OCt and Nov two full months working on testing the algorithm on the new sequences.
b) Medical image. Zheshen will be involved in this project but I need to take a lead and have my own new ideas and implementations,and do most of the experiments and communicate in time the results to Dr.Li
c)Tactile display. Zheshen will take the lead in this project. For me I need to do more reading. Think in the scenario that we will use a tactile printer rather than a touchable display.and Thinking how and what we can do based on the UW's work.What can we proposed to do with a static tactile image printer?and HOw can we achieve that?
5 Some other staff. Dr.Li asked me to help zheshen to pass the TA exam as much as I can so that she will get the financial aid.
-
Progress for NV simulation (II)
2006-06-09
On June 7 I have finished the two taks listed earlier in "Progress for NV simulation"
Now I turn to MI lesion classification.
June 8 perform training data collection.
-
0607 research progress on NV simulation
2006-06-08
Today I finished adding code to consider the background resistance force. The next step is to quantitatively analyze the properties of the NV fractal model? What kind of quantitative properties should be considered then? I think they could be:
1 the fractal dimension of the simulated NV
2the acceptance rate or rejection rate should definitely be considered?
3 what kind of factors affect the judgement of human observer as to whether the simulated NV is indistinguishable to the real ones?
Some research needs to be done on this.







