Let’s talk about the reports

Let’s talk about the reports

Another long time until last post. I’m a lazy writer after all. 😉

Working on something interesting recently, very interesting indeed, but in this article, the part that I want to speak, is the report part.

1. The requirements of the report

Let’s start with the requirements part.

For the reports, I have these requirements:

  1. The data of the report should be load from the database, but, will allow the report be generated using Java’s Map and POJO.
  2. It should respect the privileges, the privileges should get from some authentication service, and enforced by using AOP, for example, the Spring Security
  3. The report should have different kind of devices to view, for example, PDF for printers, HTML for web interface
  4. The report engine’s API should be as easy as possible to develop
  5. It should be easy to design and implement the report template
  6. It should be easy to test and render the report template
  7. Some report will need precision print, for example, every cell’s in the grid must have the width and height of 1mm exactly. Because this is the standard for these reports

And, as usual, I’ll explain all the requirements in more details as below.

1.1. The data of the report should be load from the database, but, will allow the report be generated using Java’s Map and POJO.

This seems the most useless requirement. Yes, nearly all the report engine support this. So, I’ll turn to next requirement.

1.2. It should respect the privileges, the privileges should get from some authentication service, and enforced by using AOP, for example, the Spring Security

There will be lots of users in this system, in a tree style of customer privilege model. For example, the organization level user will have more privileges than the department level user. The system can support these privileges through the container’s AOP security support(such as Spring Security). But most of the report engine will know nothing about the security, or even support AOP by default.

But, if the report service didn’t respect the privileges management of the system. Anyone know about the report API will have the ability to access the reports and data that he didn’t have authority to view. And for this system, this is a big security flaw.

Though, the user may have the authority to view the report, for example, it is the department’s manager, so he will have the authority to view the report of his department, but, he’ll have no authority to view other department’s data.

So, even though, the user may have the authority to view the report, the data retrieving part, must consider the privileges too.

Let’s make it short:

  • The report engine should have the support for restrict the access to the report template by using the privileges to filter the access
  • The report engine should have the support for restrict the data access for the report by respect the privileges to filter the data access

1.3. The report should have different kind of devices to view, for example, PDF for printers, HTML for web interface

This requirement is quite crucial. The user of this system should be able to view the reports by Phone, Tablet and PC, so the basic HTML support is a must have.

But, then they’ll need to have the report pretty printed in the paper. And the report should be pretty printed(This will have a nice report template design included)

To create the same function in different implements is a foolish idea. The report engine must have the render engine for each device, and even more, for example, can render the report to Image.

1.4. The report engine’s API should be as easy as possible to develop

This is the most basic programmer’s requirement, I’ll turn to next requirement.

1.5. It should be easy to design and implement the report template

Yes, this is an very important requirement. As a programmer, I didn’t afraid to code, but I’m not a good designer, and believe me, all the report designers sucks.

They really sucks, not only stupid and hard to use. And they all invent some kind of foolish concepts, that they thought this should be understand by the novice user, but not.

Just think of the Visual Designer based on Swing for Swing, you’ll understand what I’ve said. I prefer writing code than use these tools. They are all slow and stupid, will do nothing instead of slowing you down.

1.6. It should be easy to test and render the report template

I don’t believe WYSIWYG, and I suggest you should not believe that either.

What I believe is how can I get the result as soon as possible when I developing the report template. And render it as it should be in the live environment, so that I can understand what to do, what to change.

Let’s take an estimate.

Say, if it’ll take about 1000 operations (create, modify and delete the components in the template). And you’ll need to render the report once every 10 operations.

So, you’ll need to render the template 100 times to complete(for the real situation, it should be more, since you may tweaking some small details of the component to make it prettier).

And this time, you can see the efficiency of the report rendering.

Say, for each render, you’ll take 2 seconds more.

Then 2 * 100 = 200 seconds, you’ll take at least half an hour to complete this report(and you’ll know it will be at least 4 times larger, since 2 second is an awkward time, you’ll take other tasks just when you start the render operation, and after you switch back, it’ll be lots of seconds later).

Don’t underestimate these seconds, it’ll may effectly delay your launch and make you frastrated very quickly. You should have a better life somewhere, instead of wasting the time struggling with the report template.

1.7. Some report will need precision print, for example, every cell’s in the grid must have the width and height of 1mm exactly. Because this is the standard for these reports

This is the requirement of the user of my system. And this is an sane requirements, that’s what reports are for. And we can indeed make this possible by using nearly all the common report engine.

This is the foundation requirements for the report engine.

2. The solution for the report

So, after the requirements, let’s face the solution for the reports.

2.1. The report engine

For the report engine, I choosed JasperReports. Because it is written by Java(the language that I wrote my system). It supports JDBC’s ResultSet, Map and Java’ POJO as its DataSource.

And Jasper is very stable, it based its function on some very stable libraries, and it is used by lots people, and it is opensourced.

This is my choice, you can use your choice, but, it must have the support for dynamic report.

2.2. The dynamic report support

Like what I said above. I didn’t expect that Jasper Reports will respect my application’s privileges management, but, I’ll need to code it myself, so how can I implement that?

By writting my wrapper for the dynamic report interface above Jasper.

I’ll explain the details here:

  1. All the report request and API is created by myself, and all the service will go through the AOP security, so that I can apply the security filters as I want to make the system secure, and disable them in the development mode to let my testing more easier, since all the API is done by myself, I have the complete control of what to do and how to do
  2. My code is based on the dynamic report library above the Jasper’s engine API. It is a wrapper layer to generate the Jasper’s report template by API, so I can create and modify Jasper’s report template on the fly using the code. This is quite useful for my requirements, so that the system will have the ability of modify the appearence of the report in the live by the request parameters or the configurations, this will make the report more intelligent and I can write less reports for the user since some simillar reports can be combined into one report
  3. The dynamic report library will let me write the report by code, not by the FUCKING report designer. Believe me, I prefer to write code than using that XML based FUCKING designer, it is stupid and useless

2.3. The all in one device support solution of Jasper Report

Jasper is using a very brilliant way to handle the reports. It uses Java’s Graphics2D layer as a standard layer of abstraction for rendering.

And Java’s Graphics2D is a good abstraction for rendering after all(and a defacto standard of Java, the foundation of Swing).

So, when Jasper based it’s rendering on Graphics2D, it have all the type of Image rendering by default since JRE has already has them in ImageIO.

Then, Jasper is using the splendid iText Library to create the PDF files. The iText library implements the Graphics2D library to render the result in PDF.

By using this, Jasper has the ability to render the report into beautiful PDF files.

And, not only this, Jasper can use Apache Batik to render the report into SVG. So, we can render the report into the vector document using the same appearance.

Jasper provide its implement for the HTML too, and it will only support the logic elements, all the rending components that will need the graphics, Jasper will render them into Image, and have the image servlet to serve them.

I don’t want to write too much about Jasper in this document, but I really like Jasper’s way of doing reports.

Why? Since Jasper is using Graphics2D as the rendering engine. that means, that Jasper’s report can be render in Swing natively and perfectly, since Swing is based on Graphics2D too. And Jasper is using this for it’s report reviewer. I’ll talk about it later.

2.4. The easy to write API of DynamicReports

For me, I’m using DynamicReports to create the report templates.

Yes, I know DynamicJasper, you can choose the library you like, for me DynamicReports is good enough for me to use, and the API of it is quite pleasant to use.

So, I won’t compare about the APIs of these libraries, for me, I can write the report template by code, and the API is nice to use, is enough. You can pick anyone you like.

2.5. The work flow for me to develop the report templates

  1. You should mock the data to render the report instead of make the report reading the data from the DataBase this will save at least 0.5 seconds or more for rendering your template, depends on how you connect to your databas

  2. You should use the Swing report reviewer to review the report instead render it into PDF or HTML. Since if you want to render the report to HTM, you’ll need to setup the JavaEE server and the web application, or you won’t see any charts(that’ll need be rendered as Image, and if you didn’t configure the Image servlet — yes, even after you configure it, you’ll need to put the JasperPrint object into the session, so, this means, that you must open this HTML by using the web server, or there is no session at all). And the PDF one is slower than the Swing review, since PDF is using iText library and the Swing one is using the Graphics2D directly, the render performance is better.

  3. You should not start the Jasper Engine every time you start the template, Jasper Engine’s startup time is quite slow, will need about 2 seconds before rendering.

But, we got into a dead end here, the report template is dynamic, and is written using Java code, so we can’t really deploy it into JasperReportServer, since JasperReportServer will only support static XML report templates.

And the report template is Java class, we’ll change it every time we update the code, and need to reload it into the running report and render it out again. How can we get this done?

Here is my solution:

3. The solution for writting and review the report template faster

This part is the most valuable part of this blog, and this is the purpose of this blog, if you have read this long, please read this section carefully, this part really constains some of my adivces for writing the reports by the dynamic report API to design and debug the report templates.

  1. Use JVM’s Hotswap function to load the changed report code automaticly Using an IDE that support JVM’s Hotswap support, Eclipse for example. That will swap the code in the JVM to the new code that you have changed, so that, you can keep the report reviewer opening, without restart the Jasper engine or the View form, to view the change of your report template. But, the default Hotswap support of JVM is quite lame, it only support method level’s change, it won’t support class reloading, so, you’ll need to patch it. By using this HotSwapAgent‘s support, you’ll make the Hotswap as perfect as you needed. With this help, you can update the report template’s code in the reviewer anytime you update and save the code(it must compiled successfully).

  2. Write your report reviewer to watch the code change then update the report view automaticly. Jasper’s report reviewer didn’t support the dynamic report reload. So, even the code is changed, you can’t see the update unless what you changed is in the repaint area, and not the logical part. To avoid this, you should write your own report viewer, and then watch the code change using Java 7 NIO’s file system event support. For me, I just watch all the Java code file in the source folder, and if any file is changed, I’ll reload the report.

It’s a pity that HotSwapAgent didn’t have any JVM level event to tell me what class is reloaded, or I won’t need to watch the source file change, just listen to its event is fine.

So, if HotSwapAgent can have some API to support this, this will be quite good for me. 😀

After all this, it is quite fine to write the report by using DynamicReport.

Here is my workflow.

  1. Open Eclipse’s project
  2. Start the report preview application to preview the report in debug mode(in this mode, Eclipse will enable the JVM’s hotswap)
  3. Write and change the java code for the report, then click save, and wait the report preview application get the file system event, and reload the event(and force repaint the whole component to avoid the stale paint)

These steps only takes few miliseconds, compare to the original 2 seconds or more, it is fast like lightning.

And since the report that I saw, is the final result, this is truely what you see is what you get for me, now.

So, this is how I create reports in my project, I hope this will help you about these problems, and make your life easier.

The ES6 Development Scheme For React Using GNU Make

The ES6 Development Scheme For React Using GNU Make

ES6 is so much better than current JavaScript that I couldn’t write any line of JavaScript if not using ES6. The problem is that, main stream browser and even most recently version of NodeJS(!) couldn’t support it.

Yes, there will be so much work to be done to let these engine support ES6, but since most JavaScript developers are also as eager as me.

So they build Shim and Polyfill for it. But what about the syntax change of the ES6?

This is a little complicated, since current JavaScript engine won’t accept the new syntax, so there must be some kind of translation or compilation.

And there is, Babel is one of them, and the one I used most in my workflow (and there is another reason that I use Babel as the compiler, the reason is that Babel supports React’s JSX officially, I’ll talk about that later).

The Problems To Face When Using ES6

It is not easy to choose the way of using ES6 as the main language when developing JavaScript application. You’ll get many problems to face, just list as below:

  1. Every out file must be compiled, or you won’t get your application run on any browser or nodejs
  2. Since all the output file are compiled, you’ll face a problem of C, you can’t debug the running code, unless you have some kind of debug data exists, because the result code is quite different(compiled and optimised) than the original one
  3. There is no official JavaScript dependency mangement(for browser), so you must choose one(for example bower, or npm), and of course, they’ll have nothing ES6(since they are JavaScript running on current engines)
  4. For these files, including them or tracking the change is quite complex

The Solution

For Babel, it can compile the ES6 file and create an Source Map and using main stream browsers(Firefox or Chrome) to debug the original code (yeah!).

And let’s face the problem again:

  1. Must compile source to dest code
  2. Must compile and create source map file/data
  3. Files are stored as trees in directory, must track every file’s change so that, you don’t rebuild the whole directory again when only change a small file

What development scheme do you recall? It’s C!

For C development, we use Makefile and Bash to to that, and it works perfectly(Yes, I know there are plenty of them using Grunt, but GNU Make and Bash can do a lot better than that).

But how about the problem 3?

For C, it is quite easy, since every file is compied as an object file .o, there will be another process of building, it is called linking.

And is there any tool in JavasScript support linking? Yes, the tool I used is Browserify.

So, the solution that will solve the problem above, should be something like this (exactly same as C, I’ll add the C part as comparation):

  1. Compile the code and generate the debug information (Babel – GCC)
  2. Tracking the file change and do the incremental compile (Makefile)
  3. Complex test and directory operations (Bash)
  4. Linking the output file and required libraries together to the product (Browserify – Link)

And Beyond That

And the way beyond is that you can make the style files be compiled too(I use Sass to compile the style code)

Even more, you can add the phase of Uglifying JavaScript and Css and the deployment phase to your make file as well.

The Details

First, you must have a GNU Make installed on your system.

This should be quite easy, you can use any package manager to install it(including Cygwin), the version that I installed using mac ports, is GNU Make 3.81.

Then you’ll need to setup the compiling environments:

  • For ES6 compilation, you’ll need
    • NodeJS: This is the JavaScript runtime based on Google’s A8, and yes, this is also platform independent, you can install this in almost all the morden OS
    • NPM: The Package manager of Node, it is bundled with NodeJS’s installation, so you won’t need to install it.
    • Babel: The ES6 compiler that I used, this can be installed just using npm like this:
      npm install babel
  • For the result linking you’ll need:
    • Browserify: The tool that used to package all the JavaScript dependencies together and make them a single JavaScript file(I’ll talk about the benifits and cons of single JavaScript file and browserify in another post), you can install browserify just by using this command npm install browserify

And there you go, you can make GNU make to build your project.

The Make patterns

The first thing for compile is to add the compile patterns to do the compliation(just like C, source files are using file extension .c and the output files are using file extension .o).

I’m using this scheme to init the pattern:

  • The dependencies are using NPM to do the management, so, all the dependencies are standard JavaScript (.js) files, and used using Common JS’s require thing (but, Babel can make it better, you can just use the import key word to do the function, and babel will compile it to the code that support Common JS)
  • All the source files of JavaScript all have the .jsx file extension to distinct from the js outputs (it has two meanings to use the .jsx file extension):
    • Babel can compile the JSX code for React, so it is very natual
    • You can make your JSX editor plugin (for example VIM plugin) to recognise this, and since JSX is a super set of ES6, so you’ll make your editor support your source files automaticly
  • All the compiled result are using .js file extension, having the same file name

So, the pattern to compile all the source file is something like this:

%.js: %.jsx
        $(SILENT) $(BABEL) -s inline -o $@ $<

The meaning of this pattern is like this:

  • All the js output file can be compiled using the source code file of jsx, for example, if you want to make a js file named hello.js, make should find the source file named hello.jsx, and compile it using the operations below and make hello.js
  • $(SILENT) is the make trick to have the debug output, let’s ignore it, suppose it is blank
  • Here is the operation$(BABEL) -s inline -o $@ $< this command means that I want babel to compile the source map into the output and $@ means the output file name $< means the input file name

So, if you have this pattern in your Makefile, you’ll have the ability to make any JavaScript file by compiling the jsx file of same name, and store them in the same folder.

The Source Files And How To Build The Dist File

But how can you specify all the source files to let them to be compiled by make, you can use this way(so straight forward, but not wise)

SOURCE_FILES := hello.js world.js bye.js main.js

This is a little stupid, but straight forward.

Is there any better way to handle this? I’m using a macro to do this, it is something like this:

rwildcard=$(foreach d,$(wildcard $1*),$(call rwildcard,$d/,$2) $(filter $(subst *,%,$2),$d))

SRC_DIR := src
SRC_FILES := $(call rwildcard, $(SRC_DIR), *.jsx)

This settings will have all the jsx file in the src folder set as the variable SRC_FILES. Then you can use this variable to add the build task.

build.js: $(SRC_FILES)
    $(SILENT) $(BROSERIFY) $(SRC_DIR)/app.js -o build.js

And even add the uglified version of the build.js

build.ugly.js: build.js
    $(SILENT) $(UGLIFY) build.js -o build.ugly.js

How About Unit Testing?

For unit testing, we can just use Jasmine.

And using the variable and tasks like this:

SPEC_DIR := spec
SPEC_FILES := $(call rwildcard, $(SPEC_DIR), *.jsx)

test: $(SPEC_FILES)

The problem

So far, there is an problem using this way, is that the linker of browserify is quite qutie quite quite slow.

For my project, it’ll cost at least 6 seconds on a MacBookPro Retina 2015 to build the result , this is unbareable, so I decide to write a new linker(just the linker, which supports CMD requiring and add the glue code to make it works in a single file) in Go, and that’ll make it faster.

This will take a little time to finish, so, before that, I’ll just using browserify to do the job.

What Next?

So much for this blog post, I’ll write another blog post about the project setup, and after cleanup the code, I’ll create an project bootstrap on the github.com to let you understand the benifits of this method.

Let’s talk about Autotools

Let’s talk about Autotools

Coming from Java background. The most difficult part for writting C/C++ programs(or shared libraries) is how to make the code to run on other machines(live server, for example).

Java’s virtual machine’s architecture really saves lots of people and time, you can just compile the code on your DEV machine, and deploy(mostly copy) the jar(or war, or even ear) to the distination machine, and you’re done.

So, you can test the same code that’ll be run, and copy everything it depends along with it(war for example).

When using C/C++, you don’t have these things, you should make quite good sense to everything you code depends on.

The Problem

Why I need to write an C/C++ shared library? The story begins with a little request for writting a SMILES tokenizer for MySQL. The reason for why I need to write that tokenizer, is another story 😉

For compile that plugin’s code you’ll need:

  1. OpenBabel [headers and shared library are needed]: The foundation part of the conversion, I’ll need that to convert smiles to molecue structures so that I can tokenize it using a better context
  2. MySQL [headers are needed]: Yes, there must be a MySQL isntallation on the server, and since MySQL has a very nice plugin architecture, I didn’t need to link to any libraries of MySQL, oh yeah!

Not quite hard, for it seems.

But, your are wrong:

Headers are not so easy to find.

Different System, different version and different distribution(even the different installation method), will cause the headers you need locate at different folders.

Take OpenBabel for example:

  1. LibTool’s default location(code install methdo) will put the headers to /usr/local/include/openbabel-2.0 (yes, we’re using openbabel 2.0’s api)
  2. If you’re using systems like Fedora(CentOS for example), and install openbabel-devel using yum, and you’ll find the headers will be locate at /usr/include/openbabel-2.0
  3. If you like me, are using OS X to do the development, and install the openbabel using MacPorts, you’ll find the headers are here /opt/local/include/openbabel-2.0

Yes, for a very limit of systems(only CentOS and OS X), you’ll get at least 3 kind of locations for the headers you need, and user may change the default path too.

And, yes, for the worst, the server may not have any OpenBabel installation, you’ll inform the user that you need that.

The Libraries That You Want To Link Is Not Easy To Find Too

Like headers, libraries are quite hard to find too, because:

  1. For libtool’s default location, the static library will be locate at /usr/local/lib name like libxxx.a, and hte dynamic library will be locate at the same location with name like libxxx.so or libxxx.dylib(for BSD users, on OS X)
  2. If you install the library using yum, it’ll be here /usr/lib
  3. If you install the library using mac ports, it’ll be here /opt/local/lib

That’s not all, for Fedora, if you are using 64bit OS, the 64bit library will locate to /usr/lib64.

And yes, for the worst, the server may not have any libraries you need installed.

The Deployment Location Is Uncertain

Since I’m writting a MySQL plugin, what I want to do for target make install is to install the code to MySQL’s plugin folder.

And different installation of MySQL, different system, even the default plugin folder will be quite different.

And, even worse, there maybe no MySQL plugin folder at all.

The Function You Need May Not Exists

Yes, that’s not all of the problem. For my another application, I came to a problem that some api I used in OpenBabel 2.3.2(from MacPorts) is not exist in 2.2.3(from CentOS6’s epel yum repository). So I must disable some function when compiling my code on the system that didn’t support the api of OpenBabel 2.3.2, and let other functions to work as well.

I should have a better way to do this.

My Solution

So I came to GNU’s Autotools. The reason I choose that is that MacPorts use it by default, and PHP use it to build plugins, 😀

Then I found out, Autotools is so hard to use, especially for newbie users…..

This blog will needs you have a little background knowledge of GNU Make and the knowledge about how to write Makefiles.

Problems When I Use Autotools For The First Time

  1. What’s the working flow using Autotools?
  2. How should i start?
  3. What are the commands I should use, and how to use?
  4. What file that I need to code?
  5. If I want to write a shared library(like MySQL plugin, what should I do)?
  6. What are AC Macros? Where is the Fking documentation for the Fking AC Macros?

These problem is the motivition for this blog.

Since its TOO HARD to beginers!!!! There is very less documentation for Autotools for beginers, and the offical documentation is a piece of SHIT!

This will scare most of the beginners away from it! I’ll try to make it a little simpler to beginners so that they can begin to play with Autotools.

What’s the working flow to use Autotools? How should I start? What commands that I should use?

Autotools is a set of tools to help you write the code that adapt to migration between different systems and installations. It can be break down to these command:

The Commands

  1. autoscan: This program will scan all of your code, and generate a boilerplate for your configuration(configure.ac) for Autotools
  2. aclocal: Generating autoconf’s local macros, if you do not use this command to generate the macros, you’ autoconf execution will probably get a macro is not defined error
  3. autoheader: This will use the configuration in configure.ac to generate your config.h.in (The input file for automake to generate Makefile.in)
  4. autoconf: This is the core part of header and library resoving macro support. This command will using the configuration in configure.ac to generate the configure script
  5. automake: This will take the Makefile configuration file Makefile.am to generate the Makefile template Makefile.in

And that’s not all, if you want to write share library, you’ll need this:

  1. libtool: The command line tool to create and install libraries, Autotools will support this by default(sure, they are from the same orgnization, aren’t they?)

So, there is at least 6 commands you should know, and I’ll list the files that you should write or get(for beginners, this is quite difficult):

The Files

  • configure.scan: This is the output file of autoscan, you can rename it to configure.ac(it’ll create some boilerplate for you)
  • configure.ac: This file is very important, this is the core configuration file for your Autotool build system, nearly every magic part of Autotools is configured here(using M4 macros)
  • aclocal.m4: This file is generated by aclocal, this file will read the configuration of configure.ac and initialize the macros you’ll need(for example, the automake macros and libtool macros), this is quite quite important for the Autotool’s command execution
  • config.h.in: This file is generated by command autoheader, will be the input file for automake to generate the file Makefile.in
  • Makefile.am: This file is the Makefile template that you need to write, in this Makefile you’ll need to define the targets and the variables(but strongly suggest you define these variables in configure.ac and let Autotools write these variables automaticly for you to your Makefile, I’ll discuss about this later)
  • Makefile.in: This file can be generate using automake, this file will is the template for the final Makefile(without pathes, since the path resoving is done by configure)
  • configure: This is the final product for Autotools, since other product is generate by this script or the product of this script. This script can be created by autoconf command
  • config.status: This file is generated by configure, and this script will generate the final config.h and Makefile
  • config.h: This is the core part for migration, you can generate all the detection as the macros in this header file, so you can add macros in your code to do the tricks(for example, if some function is missing, will remove some functions, or if is in Windows, using some F**king api instead of using POSIX API)
  • Makefile: Ah~~~ At last, we come to a file that means something…..

See? That’s why I said Autotools is quite hard for beginners. It has 6 commands(7, for including libtoolize), and 9 kind of files (input or output or input and output).

The Workflow

I’ll just describe the workflow of the share library development(since it is more complex).

  1. Run autoscan to generate the configure.scan
  2. Rename configure.scan to configure.ac
  3. Run libtoolize –force to add the libtool support (you’ll need AUTHORS, COPYING, ChangeLog, INSTALL, NEWS and README files in the folder, or add –install option to let libtool copy these files for you.)
  4. You’ll need to enable libtool and automake in your configuration, so add these code into your configure.ac

    AM_INIT_AUTOMAKE # This Macro will initialize the automake
    AC_ENABLE_SHARED # This Macro will configure the libtool to use shared library other than static
    LT_INIT # This Macro will initialize the libtool
    AC_CONFIG_MACRO_DIR([m4]) # This will provide libtool’s macros to your autoconf configuration file
    AC_OUTPUT(Makefile src/Makefile) # This will let configure generate the Makefiles for you

  5. Run autoheader to generate config.h.in

  6. Create your own Makefile.am(you can see here for the exmaple of writting the program’s Makefile.am), for shared library, you should use this code(if you want to install the library to your destination other than /usr/local/lib):

    pkgplugin_LTLIBRARIES= xxx.la
    xxx_la_SOURCES = xxx.h xxx.c

  7. Run command autoconf to generate the configure script
  8. Run command automake to generate the Makefile.in (maybe you should add option –add-missing to add the missing files)
  9. Run command automake to generate the Makefile.in
  10. You’re almost done, you can run ./configure to generate the config.h and Makefile then

Yes, 10 steps. 2 kind of files to write (configure.ac and Makefile.am).

The workflow will be like the image below:

Autotool Workflow


  1. Use AC_MSG_CHECKING Macro to send the checking information to your user like this AC_MSG_CHECKING(F**king Windows API)
  2. Use AC_MSG_RESULT Macro to send the checking result to your user like this AC_MSG_RESULT(Yes, you are using F**king Windows XP)
  3. Use AC_MSG_WARN Macro to warn the user that some of function is not working, but don’t stop the checking AC_MSG_WARN(I’m afraid some is not going to work….)
  4. Use AC_MSG_ERROR Macro to stop the flow, let user to install the dependencies like this AC_MSG_ERROR(You should at least to have brain to go on)
  5. If you are using F**king C++, you can’t use AC_CHECK_LIB since C++ have a bad naming convention…. You should use AC_LINK_IFELSE to do this, the details is here
  6. Use AC_SUBST Macro to add varibles to your Makefile like this AC_SUBST([stair_to_heaven], [not exists])

The complete documentation for Autoconf Macros is here, help yourself.


Thanks for watching….

Why we needs another data processing framework


I have many data processing work to do recently.

Yes VERY MUCH data processing work.

I have wrote a processing framework based on Rhino and Spring, called Jersey, which means JavaScript with easy.

It is fun to play data processing with Jersey, but there are 2 shortcomings:

  1. The startup time for jersey is too long, it’ll need about 2 seconds to startup the context (sure, you needs to start the java virtual machine, initialising the Rhino run time and then startup the Spring container, 2 seconds is not so bad). But it is nearly unbearable for me to just play something around(yes, java is stable, but, in the run and off scheme, IT IS REALLY SLOW, why? There is always lots of bootstrap there, yes I know that’s for flexiablility, but it is really slow, man!).
  2. The memory footprint for Jersey is to large. For jvm, it always wants more memory, I can wrote a python crawler, and run it using a thread group of about 10 threads, and still consumes less memory than the memory that jvm used in HelloWorld. This is very very bad, since the crawler that I wrote need to run as many as possible

So, I went to Python(2.7) for small tasks(even more bigger tasks).

Python is a little better faster, but compare to Jersey, it lacks:

  1. Better Unicode Support: This is fundamental!!!! I don’t get why Python community ignore this at the very beginning. I can’t open a CSV file properly without using a thirdparty library
  2. Fast MySQL Driver: I tried pymysql(didn’t get time to try others), and found out it is a little slow, I’ll explain it in another blog
  3. Libraries: Sure, Python is a good language, and many people using it to do serious things. But compare to Java, the library is still not enough, at least for me on the data processing work
  4. Not consistent for me: I’m working on a PHP framework for building website(and a CMS based on it) now, so why I need to code the data processing tool using Python than PHP, since I can use the library that I wrote for PHP

So, I gave up python for processing data.

And try to give PHP a try.

A little thoughts on data processing framework

After reading the section above, you’ll get to know why I’m using PHP as the language of my data processing framework(I’ll keep jersey working though. 🙂 )

And here is some thoughs of what a data processing framework can do (at least for me):

It should connect to most of the popular datasource

This is the foundamental part for the framework.

No matter how good your framework is, it is still useless if it can’t even connect to MySQL, Postgres.

And for nowadays, it should have mature libraries or drivers to connect to the nosql data storage(like Solr, MongoDB etc.), make the data transfer fast and safe.

It should based on a scripting language

This is same as Jersey. For data processing framework, testing and adjusting might happens on the live server(or the crawler master), this is the reason that I hate Hadoop… Why I needs to recompile and package and redeploy the code just to change a tiny bit on the crawler (only to run a small test)? Hadoop’s HDFS is good though.

It should have the ability to run across the platform

This is same as Jersey. That’s why Jersey is based on Java…. Luckily, most scripting language can run on all the major platforms we used today.

It should be very easy to extend and configure

It should be a framework contains lots of goodies, and from the foundation and the libraries is very flexiable to change or override.

So, no matter how complex the requirement is, there is always a better way to base the program on the framework(Eclipse is a good example).

It should run very fast, and have very little memory footprints

This is the same as the background section, you need to run it and get the result instantly if the processing is easy.

It should have the progress bar support by default

I don’t think I should explain this.

It should embed a fast rule engine

It is very important to embed a fast rule engine into the data processing framework.
Let’s view the basic work flow for data processing:

  1. Load the data from the datasource
  2. Transform the data into a common structural format(most data processing tool using XML)
  3. Processing the data
  4. Transform the data into the destination format
  5. Store the data into data destination

For step 1, you need the ability to connect(it is nothing with rule engine)
For step 2, the best transform method is rule based, it is more readable and extendable, I’ll show you an real world example here

Let’s suppose you have a small task to collect the user information collected using OAuth on 2 different platform(Twitter and Facebook for example.)

Platform 1(as p1)’s data format is(using json):

    "nick": "Jack",
    "profile_image": "a.jpg",
    "birthday": "someday"

And Platform 2(as p2)’s data format is:

    "screen_name": "Jack",
    "img": "b.jpg",
    "birthday": "someday"

There is lots of the records(about 100,000 each). You needs to transform them into a standard form


Let’s using PHP and some fake code to do this, the first is using PHP code:

function processP1($arg) {
    $ret = array();
    if(isset($arg->nick)) {
        $ret['nick'] = $arg->nick;
    if(isset($arg->profile_image)) {
        $ret['profile_img'] = $arg->profile_image;
    if(isset($arg->birthday)) {
        $ret['birthday'] = $arg->birthday;
    return (object) $ret;

function processP2($arg) {
    $ret = array();
    if(isset($arg->screen_name)) {
        $ret['nick'] = $arg->screen_name;
    if(isset($arg->img)) {
        $ret['profile_img'] = $arg->img;
    if(isset($arg->birthday)) {
        $ret['birthday'] = $arg->birthday;
    return (object) $ret;

The second is CLIPS code:

(defrule set-result-nick-from-nick
    ?a <- (arg nick ?nick&~nil)
    ?r <- (result (nick nil))
    (retract ?a)
    (modify ?r (nick ?nick))

(defrule set-result-nick-from-screen-name
    ?a <- (arg screen_name ?nick&~nil)
    ?r <- (result (nick nil))
    (retract ?a)
    (modify ?r (nick ?nick))

(defrule set-result-profile-img-from-profile-image
    ?a <- (arg profile_image ?img&~nil)
    ?r <- (result (profile_img nil))
    (retract ?a)
    (modify ?r (profile_img ?img))

(defrule set-result-profile-img-from-img
    ?a <- (arg img ?img&~nil)
    ?r <- (result (profile_img nil))
    (retract ?a)
    (modify ?r (profile_img ?img))

(defrule set-result-birthday-from-birthday
    ?a <- (arg birthday ?birthday&~nil)
    ?r <- (result (birthday nil))
    (retract ?a)
    (modify ?r (birthday ?birthday))

Some one may argue, the first one can be write as one method like the second one too.

But, the world is changing, if p1 has change its protocol(say, change profile_image to img), and you’ll find you will regret to jam them together.

As you can see the code above, the second one is more consice, and better, it won’t have any assume of p1 or p2.

So, if time changes you’ll need to process some platofrm called p3’s information, you won’t need to change you code very much(just adding the missing rules, and if you are lucky, you may need not to add the rule, since the field of the user profile is mostly the same).

For Steps 3 and 4 is the same as the step 2. Rule engine runs faster and better when you needs to write lots of if..then..else.

And it is very easy to read and maintain.


For my PHP website framework, I choose CLIPS to do the rule processing, not only on the business logic.

I used it as the foundation of the framework, maybe you are curious about the desigin, why I should use a rule engine as the foundation of the framework?

Here is the example.

  1. The core rules to load configuration: Where to load the configuration, it seems to be very tricky, if a framework is flexiable, it can load at lots of places, and where to find is configurable too
  2. The core rules to load PHP scripts: This is the most foundation part of every PHP framework, if you think this should be very easy, take CI‘s CI_Loader as an example, and try to read it to understand the routine, and if you dare, try to add one more rule. 😀

So, I wrote a plugin for PHP first, it called php-clips. It is nearly stable for now (It can be compiled and installed using PHP’s building tools).

And I’m trying to an PHP framework to implement my thoughts above, this framework has the features as:

  1. Embed clips as its core
  2. Can be run at commandline as an application
  3. Just like jersey, will load the classes and extension on the working directory, or any configured directory(can be configured by the system wide configuration /etc/… or find the path from the environment variable, sure this is configurable too 😉 )
  4. You can use clips engine anytime, and even open a console(using PHP readlines) to run the clips commands your self manually
  5. It can run the clips scripts directly, if you want, you didn’t need to write 1 line of PHP
  6. You can replace any foundamental part of the framework just by overriding it(no need to replace the script, just like CI, you can have MY_XXX to replace the original classes, any class, and yes, this is configurable too. 😉 )
  7. It is written follow the CI’s guidelines, so, you’ll find the API and even the folder structure is like CI, but using the rule engine CLIPS as its core
  8. It is using mustache as the template engine, simple and fast
  9. It has the resource scheme and handler desigin just like spring, and you can write your own handler using PHP’s resource scheme and handler design too
  10. It’ll using Console-ProgressBar based on Curses to show the progressbar(the same progressbar like PEAR)
  11. It’ll distribute using PEAR

This little toy can be found at clips-tool, it is functional now.

Still in development, so it really lacks the documentation. I’ll make the documentation better when the current data processing work is done.

Complains about GCC

I’m working on a small Chinese text parser plugin for MySQL 5.6, I’ll write another blog post about it.

The problem that made me to write this blog post, is quite a simple one.

Since I’m working on the Chinese text parser, I need a Chinese tokenizer library, and written in C.

I choosed Libmmseg, as stated above, I’ll write about it in another blog post.

I tried to write a small application using libmmseg, and made it compiled on my MBP.

Then, I need to transfer the application to a linux system(CentOS 6.5) and make it compile again.

Then, I got the problem.

The problem is something like this:

/tmp/ccsP9If4.o: In function `segment(char const*, css::Segmenter*)':
mmseg_test.cc:(.text+0x1af): undefined reference to `css::Segmenter::setBuffer(unsigned char*, unsigned int)'
mmseg_test.cc:(.text+0x1e6): undefined reference to `css::Segmenter::peekToken(unsigned short&, unsigned short&, unsigned short)'
mmseg_test.cc:(.text+0x207): undefined reference to `css::Segmenter::popToken(unsigned short, unsigned short)'
mmseg_test.cc:(.text+0x267): undefined reference to `css::Segmenter::peekToken(unsigned short&, unsigned short&, unsigned short)'
mmseg_test.cc:(.text+0x2af): undefined reference to `css::Segmenter::popToken(unsigned short, unsigned short)'
mmseg_test.cc:(.text+0x30b): undefined reference to `css::Segmenter::thesaurus(char const*, unsigned short)'

That’s quite strange, since I have compile the libmmseg and install it to /usr/local/lib, and added the dependency, but the ld still can’t find the library…… WTF!!!!

I tried lots of the ways, I even compile and try to link the objects by hand, and still can’t be done.

I’m frastrated.

Then I see this.

It said:

The trick here is to put the library AFTER the module you are compiling. The problem is a reference thing. The linker resolves references in order, so when the library is BEFORE the module being compiled, the linker gets confused and does not think that any of the functions in the library are needed. By putting the library AFTER the module, the references to the library in the module are resolved by the linker.

This saved my life!!!!

So, this is the reason I wrote this small blog post, the God D*mned GCC!!!!! Why it is so stupid like this when llvm can handle this beautifully?

I suddenly get understand why Apple give up it and using llvm instead, sure, I’ll try to use llvm myself instead of gcc too.

Sorry RMS and GNU.

Thinking about Note taking tools, seriously

Let’s talk about note taking, seriously.

Why I need to do this? There is lots and lots of note taking application and tools there in OS X.

And, even Apple has a nice note taking tools called Notes(yes, maybe not that good), and the famous Evernote (They did very good job on Notes, I must admit).

So, then, why I think there is the needs for a new note taking tool?

A shot answer, they are build for normal users, not for programmers.

And for the long answer, I’ll begin the requirements of note taking tool for programmers.

Here is the points:

  1. It (the note taking tool for programmers) must have the ability of bring out to you any time you want it, no interupt or changing the envronment (just like Dash run in the headless mode, yes, I like Dash very much!), it can be achieve by assigning a global shortcut key to the HUD window
  2. It must have the ability of saving the note you have taken as soon as you taken it, and store it even the application is quit, so you can view the notes next time you open it
  3. It should save the notes in the cloud, so that you can use the note you taken on every computer
  4. It should have the ability of store and showing Rich Text Format
  5. It should have the ability of Syntax coloring for the code snippets, and yes, by guessing or set (can be done using plugins)
  6. It should have the ability of text transforming (such as RTF to plain txt, Markdown or RST transform to HTML or other format, via plugins, of cause)
  7. It must interact with the clipboard

That’s why I think there should be a note for programmers.

I, as a programmer, really needs some text transformer or a shelf to come just as I called, it will just jump to me without let me goto desktop from my full screen application.

And, after that, it will send the text transformed to the clipboard(just plain text in most times), and just go away(with my text stored for next usage).

I have tried every note taking tools I can get, but none of them support my points.

They are beautiful, elegant and powerful, can even sync with my iPhone, but, they can’t answer for my requirement 1,5,6 (and that’s the function I want most).

So, I want to start a note taking tool on my own.

I’ll opensource it on Github when I finished the basic functions.

Go back to coding now.

A little complain about launchctl

It’s been a quite while since my last post.

That’s because I’m working something big on PHP, I’m writing a right and agile PHP development framework right now, and it has nearly finish the alpha stage.

I’ll make it to the world and open source it when I finish it.

Here is a few features features I’d like to talk about:

  1. Based on CI (using the methodology and definitions of CI), for agile and quick development, and easy to understand or put hands on it
  2. Fully support responsive design leveraging less and bootstrap 3.0
  3. Template functions from Smarty(and lots of plugins)
  4. Very smart image responsive support, leveraging a slightly modified of jquery-piciture and php-imagick, can get the picture any size you like on the fly
  5. Very smart Datatable Master view, using jquery-datatables, integrade datatable as Master view in the deep of controllers, coding master view couldn’t be more easier
  6. Very smart alternative Listview Master View, using the same methodlogy of jquery-datatables, and have 2 modes of the responsive design
  7. AOP support for PHP, can use regex and other string tools to intercept the controller methods(for security, transaction, auditing and logging usages)
  8. Sophisticated designed security mode, can be configured using database of template file, and using AOP to do the security, without invading the code

I can make quite a long list for this.

But, I have to stop, since, this post is for a little complains about launchctl.

launchctl is the fundamental application for OS X, it handles all the deamon processes, and it can do lots of things.

I won’t state lots of launchctl here, you can view the tutorial about it here.

Here is my complain.

I use vim in work a lot. As everyone knows, vim support ctags to do the synbol navigation.

But keeping update ctags’s tag file when updated the code is quite tiresome.

So, I tried to add some auto execution scripts to let ctags update my tags file automatically after a short time interval(1 minute, for example).

Yes, I can use crontab to do this. But, since I’m using OS X(and love it so much), I want to be more appleish, so I gave launchctl a try.

I tried the example here, it is a brilliant tutorial, but, the sample it brought is not the correct one!!!!!

Here is the sample it provided:

<?xml version="1.0" encoding="UTF-8"?>
<plist version="1.0">







See the problem???

You’ll be confuse as I was.

Yes, the problem is here:


You should use Label instead of label, or launchctl won’t recognise this plist!!!!!!

And, investigate more carefully, you can see, all the key is capitalised but the label!!!!!

Are you kidding me?!

I google about 10 minutes about this problem(or maybe 30 minutes more, since I use the time reading when the page is loading, no kidding, I’m in mainland China, and must get pass the GFW to access google, yes!)

And after that, launchctl still won’t work, it complains:

launch_msg(): Socket is not connected

This is quite a difficult problem, and quite few articles about it. Yes, when you know the answer, you can know why.

I tried to fix this problem, for about 1 hour, just reading the post or other things. Frustrated nearly gave up.

And, I came across 1 forum post(sorry, forgot the url), it says this problem can be found in iTerm2 only, and won’t in the native terminal.

I tried, and, get passed!!!!!

So, this is a defect of iTerm2. This bug nearly let me gave up OS X, thanks, at least I knew where the problem is and can get my time to do something more useful.

I don’t know if the defect has been submitted in iTerm2’s bug tracker, so I have submitted the defect to them, the defect I submitted is here