Category Archives: Essay

This is the category for all my essays, the essay can talk about anything, mostly in philosophy and programming.

Steps To Setup and FTP Server

Here is something really basic as FTP setup.

Yeah, ftp setup is easy and fun isn’t it?

What you need is just install an ftp server software, configure the users, and you’re done.

Piece of cake, right?

YOU ARE FUCKING WRONG!!!!!

I’ll write my steps for setting up an secure FTP server, in case this will help some freaking guy like me out.

You should use a good ftp software.

This could be a very easy choice if you’re using distribution like CentOS or RHEL.

They suggest you install vsftp as the ftp software. I’m not an expert at this domain, and as so far, vsftp works fine for me.

You should create the ftp user in the Linux and setup the permissions

vsftp using Linux’s user system and file system as its user system and file system, it’s a brilliant idea to have, since it can have the most sophisticated user permission system on the fly.

But, this requires you to treat your users and system more carefully, don’t make the folder opposed to FTP or FTP user to open, so anyone can update or read your file by ftp without any problem.

Fine, this is not the key point I want to make, so I make them as short as I can, let’s go to the KEY POINTS

1. You must setup SELinux to accept your FTP, or it will kill your vsftp when it tries to access the file system.

This is a very fucking thing, but it is true. If you didn’t tell SELinux that vsftp’s action is fine, SELinux will stop the action to keep folder safe.

SELinux can be your friend in many ways, so turn it down may not be a good option.

I have googled the ways to make these two things work together, and here is the way:

/usr/sbin/setsebool -P ftp_home_dir=1 

This command will update the SELinux’s policy, and let ftp application have the previldeges to access user’s home folders.

This command will take a little time to execute, but this is the easiest way to acchieve this target, believe me.

2. You must configure the iptables firewall to let FTP application to connect

This step is easy to understand, no one wants his server too open, so at the begining, iptables only let ICMP and SSH requests to access the ports of the server.

In order to let FTP application to access the server, you must open two ports, 20 for data transfer, and 21 for commands.

So the configuration for iptables should be like this:

-A INPUT -m state --state NEW -m tcp -p tcp --dport 21 -j ACCEPT
-A INPUT -m state --state NEW -m tcp -p tcp --dport 22 -j ACCEPT

After this, your FTP application can connect to server then.

Are we finished yet?

NO!!!!, not yet.

You still can’t upload your files onto the server.

Why?!

Because:

VSFTP IS USING PASSIVE MODE BY DEFAULT, and the passive mode of FTP is like this:

  • FTP Client tell server: Let’s using passive mode
  • Server respond: You can connect to me using port xxxx for this transfer
  • Client open a tcp channel on local 2001 to server’s port xxxx to start

Yes, passive mode can make use more port on server than active mode, this is a better way to use, isn’t it?

But, did you remember, that we only allow port 21 and 20 for requests on iptables?

So, this is a very very very big problem for FTP applications.

They’ll confused by the server, server told them to open a connecto to port xxxx, but when they try, they’ll get a connection refused.

So, you need to:

3. Change the configuration of vsftpd to let passive mode to use only port of a range

For example, like this:

pasv_max_port=10100
pasv_min_port=10090

This only opens 10090 to 10100 port for passive mode.

Then

4. You need to chnage iptables configuration to let port 10090 to 10100 open for requests

-I INPUT -p tcp --dport 10090:10100 -j ACCEPT

Then your FTP server is done and secure, and if you want to make the transfer to be more secured, you can:

5. Adding SSL transfer support to vsftp

First you need to generate a self assigned ceritificate for SSL

cd /etc/vsftpd
/usr/bin/openssl req -x509 -nodes -days 365 -newkey rsa:1024 -keyout vsftpd.pem -out vsftpd.pem

This command will generate a certificate for SSL and this ceriticate will valid through a year.

Then you need to change /etc/vsftpd.conf adding these lines

# Turn on SSL
ssl_enable=YES

# Allow anonymous users to use secured SSL connections
allow_anon_ssl=YES

# All non-anonymous logins are forced to use a secure SSL connection in order to
# send and receive data on data connections.
force_local_data_ssl=YES

# All non-anonymous logins are forced to use a secure SSL connection in order to send the password.
force_local_logins_ssl=YES

# Permit TLS v1 protocol connections
ssl_tlsv1=YES

# Permit SSL v2 protocol connections
ssl_sslv2=YES

# permit SSL v3 protocol connections
ssl_sslv3=YES

# Specifies the location of the RSA certificate to use for SSL encrypted connections
rsa_cert_file=/etc/vsftpd/vsftpd.pem

after these steps,

6. Restart all the services

service iptables restart
service vsftpd restart

And, you’re done.

So, what we learned today?

  1. It is very hard to be secure, especially for a very easy and foundamental service like FTP
  2. Linux is secure, only when you are understanding it more deeply and use it more carefully
  3. Don’t blame firewall for the problems, it protects you
  4. When something is wrong, maybe the only problem is at your understanding, so, read and ask before compian is a good way to solve the proble

The words about setup linux box into wireless router

I have worked for about 1 month on an instresting project. It has something to do with the captive portal.

I’m really a rookie in these technologies, though I have studied something about the network technologies, but never so deeply as this time.

After these days learning, I found out how powerful the Linux kernel is, and here is something I learned.

How to setup a wireless ap using Linux and an antenna using the bridge mode

If you want to make a Linux box to be a wireless ap, in the most easy mode(the bridge mode), you should have something like this:

  1. A Linux Box, have at least Linux Kernel 2.6 installed (I’m using CentOS 6.4, a pleasant distro to play on)
  2. An Ethernet card for the Linux Box, so it can connect to the router you want it to connect to
  3. An wireless antenna, and have the driver installed as a module of the kernel (it’s a long story, I’ll write another article about that)

Beware: Make sure your antenna supports running at the mode of master or monitor, you can check the running mode using iw tool, if your antenna
didn’t support at least master or monitor mode, you are doomed, you can’t make the antenna used as ap antenna

Then you can begin like this:

  1. You need to have hostapd installed, hostapd is needed to host your wireless antenna as an ap antenna, unfortunately,
    you can’t install this using yum, you must download the source code and compile it(not so hard).
  2. You need to create a bridge to bridge ethernet interface(say eth0) and wireless interface (say wlan0), it is very easy to create a bridge like this in RedHat, just edit the file /etc/sysconfig/network-scripts/ifcfg-eth0, and configuration added the bridge=br0, and change the wlan0’s file too, doing the same change, after that, restart the network, you have the bridge
  3. Configuration for hostapd is not very straight forward, and many options are there, you need to choose the wireless interface(wlan0 in most cases), the channels for the wireless ap, the password settings for the ap, and the running mode (802.11n or 802.11ac if your antenna supports it), there are many blogs about how to configure hostapd(like this), I won’t bother to make it detailed here
  4. You must give the bridge the ip, so you have to change the /etc/sysconfig/network-scritps/ifcfg-br0, so you need to using ifconfig to bring it up first, then change the file, give it a static ip or using DHCP to get an ip from the router

Then you are done.

Since this ap is running like a bridge, a bridge between wireless and ethernet, it is the most easy and robost way to run your linux box as an access point.

How to setup a wireless router using Linux and an antenna

Set up a wireless router is more complex than just an access point, but the first steps are the same, you need to have the hostapd installed and configured and then:

  1. Have dnsmasq intalled(you can install dhcpd as you want, since dnsmasq is more easy to config)
  2. Give a static ip to your wireless interface (make it an lan router, so give it the ip somehing like this 192.168.0.1)
  3. Make dnsmasq provide the listen on the wlan0 interface, so every device connect to the wlan0, can get its ip address and router(192.168.0.1) from it
  4. Configure iptables, allow devices access port for dhcp

For now, the device(phones or notebooks) can connect to your router, but they can’t get access to the wan, since your router don’t know how to get to the wan in your wlan0.

So you need:

  1. Add a NAT rule, something like this iptables -t nat -A POSTROUTING -o eth0 -j MASQUERADE , this is allow anything come from any interface have disguise that they are send from eth0(the WLAN port of you linux box router)
  2. Don’t forget, let kernel allow ip forward, net.ipv4.ip_forward=1
  3. And don’t forget, let ip tables allow forward from wlan0 too, iptables -A FORWARD -i wlan0 -j ACCEPT

After this, every ip packet come from wlan0 can have its way to eth0 and go out from the kernel, and this is the basic router working mode for your home router too.

Conclusion

It is not so hard(yet not very easy) to setup a linux box into a wireless router, but if you finally make it done, you won’t get it more better than the router you have purchased at the same price(since it has tunned kernel and hardware), but you can gain as many controll as you want, and keep hacking.

So, in a few

About the mod_perl and perl-Apache-Test conflict in installing Packetfence

It’s been a very long time since my last post.

Very busy these days. Now I’m working for a company try to create captive portal, I make packetfence a try.

It is a very nice application based on CentOS.

I started with CentOS 6.4, added the repositories it needed, and then:

yum install -y packent fence

What a nice day!

But bang!!!! what!?

file /usr/share/man/man3/Apache::Test.3pm.gz conflicts between attempted installs of perl-Apache-Test-1.30-2.el6.rf.noarch and mod_perl-2.0.4-10.el6.x86_64

file /usr/share/man/man3/Apache::TestConfig.3pm.gz conflicts between attempted installs of perl-Apache-Test-1.30-2.el6.rf.noarch and mod_perl-2.0.4-10.el6.x86_64

file /usr/share/man/man3/Apache::TestMB.3pm.gz conflicts between attempted installs of perl-Apache-Test-1.30-2.el6.rf.noarch and mod_perl-2.0.4-10.el6.x86_64

file /usr/share/man/man3/Apache::TestMM.3pm.gz conflicts between attempted installs of perl-Apache-Test-1.30-2.el6.rf.noarch and mod_perl-2.0.4-10.el6.x86_64

file /usr/share/man/man3/Apache::TestReport.3pm.gz conflicts between attempted installs of perl-Apache-Test-1.30-2.el6.rf.noarch and mod_perl-2.0.4-10.el6.x86_64

file /usr/share/man/man3/Apache::TestRequest.3pm.gz conflicts between attempted installs of perl-Apache-Test-1.30-2.el6.rf.noarch and mod_perl-2.0.4-10.el6.x86_64

file /usr/share/man/man3/Apache::TestRun.3pm.gz conflicts between attempted installs of perl-Apache-Test-1.30-2.el6.rf.noarch and mod_perl-2.0.4-10.el6.x86_64

file /usr/share/man/man3/Apache::TestRunPHP.3pm.gz conflicts between attempted installs of perl-Apache-Test-1.30-2.el6.rf.noarch and mod_perl-2.0.4-10.el6.x86_64

file /usr/share/man/man3/Apache::TestRunPerl.3pm.gz conflicts between attempted installs of perl-Apache-Test-1.30-2.el6.rf.noarch and mod_perl-2.0.4-10.el6.x86_64

file /usr/share/man/man3/Apache::TestServer.3pm.gz conflicts between attempted installs of perl-Apache-Test-1.30-2.el6.rf.noarch and mod_perl-2.0.4-10.el6.x86_64

file /usr/share/man/man3/Apache::TestSmoke.3pm.gz conflicts between attempted installs of perl-Apache-Test-1.30-2.el6.rf.noarch and mod_perl-2.0.4-10.el6.x86_64

file /usr/share/man/man3/Apache::TestTrace.3pm.gz conflicts between attempted installs of perl-Apache-Test-1.30-2.el6.rf.noarch and mod_perl-2.0.4-10.el6.x86_64

file /usr/share/man/man3/Apache::TestUtil.3pm.gz conflicts between attempted installs of perl-Apache-Test-1.30-2.el6.rf.noarch and mod_perl-2.0.4-10.el6.x86_64

The conflict in man!!! Man!!!

You got conflict in the Fcking manual that I wasn’t suppose to read and **failed!!!!!! the installation completely!!!!

I tried to google for that, and plenty of the problem like this

And, they won’t get anything as fixed….

So how can I go on?

I tried a dirty way.

First, I install the mod_perl and perl-devel using yum:

yum install -y mod_perl perl-devel

Then let’s download the perl-Apache-Test from CPAN:

wget http://search.cpan.org/CPAN/authors/id/P/PH/PHRED/Apache-Test-1.38.tar.gz

Then let’s install it:

perl Makefile.PL

make install

But, even we installed the f*cking perl-Apache-Test, we can’t be done, yet.

Yum won’t know you have installed the perl-Apache-Test, and will still try to download it and install it and failed your installation.

So, you need to test yum to skip it, adding 1 line to /etc/yum.conf

exclude=perl-Apache-Test

Then try the yum install again, you’ll be done.

It cost me 1 hour to do this investigation and how frustrated I was, so I wrote this blog to record it down, if this can helps anyone, the time and energy of mine won’t be wasted.

TakTuk

Thinking on daemon process launching and management

For the project I recently working on (fetching data and analyse them), I setup a crawler and analyser cluster.

First many(as many crawler as to make the max of the server) crawlers must be spawned and configured.

Since the crawler can use zookeeper to configure it self, the configuration part is not need to considered.

The hard part of this architecture is that, you’ll need to launch many crawler instance by hand, and if you want to reconfigure them(for example, the data processing rules, they are loaded at the very beginning, and won’t get reload in the whole processing time, part for rules won’t change that much, and part for the rules need to be compiled).

Sure, I have wrote a script to launch and kill the crawlers, here is the thoughts on how to implement the script and what function should it have:

  • 1 It must be executable and nearly dependless on most of the Linux distributions

It is the same thought as the crawler since I want the crawler can be run on as many machine as possible, so the crawler is based on jersey(The javascript runtime I have written on Rhino, based on Java). As for this though, we have 3 options: bash, perl, Java

  • 2 It must be easy to deploy on many machines, or has the self publish function.

This is a very fundamental function of this launching script. Since I just tired of deploy the crawler over many machines, and deploy the script bring even more burden for me.

And the crucial part of this function is that, how to publish the scripts it self to many machines so that you can spawn many processes on that machine and keep management them.

For publish the script to other machine, we can use scp, a powerful, easy and safe way to copy resources from machine to machine. Just need a little configuration, you can copy the files from master to many slaves without interaction.

I can wrote a script to handle that, so that after I deploy the script on the master machine, I can deploy it automatically to many slave machines

For this function, I find out taktuk, a very good tool to play and use in order to manage many slave machines(say, install java or jersey for them). It use perl as its own language, and has more power on the publish, but I won’t talk about it very deeply in this article, since it don’t have the ability of spawning many processes and manage them(but you can really use it as the publish layer)

  • 3 It should be thin, light and costless for spawning processes

I don’t think run this script as a server is a good idea (at least for my opinion). The script just launching the processes with proper stdin, stderr, stdout, working directory and die after that. I don’t think keep this script running and wasting the resources is a wise idea. It is just a launcher after all .

  • 4 It should be configurable, say to launch how many process at 1 time

As I wrote on the above, I want the script to launch many crawler on 1 machine, so that the crawler will take as many resource on that machine as we can get. So, I want the script can spawn many crawler processes at 1 time, and each has its own stdout and std error.

And, yes, the slave machines’s ip should be configurable for this script.

  • 5 * It should launch the process as daemon*

Since the crawler runs day and night on the server. I don’t want it lives only in my ssh session. So I must make them an OS daemon to run in the background and do not harm by any signals or sessions. The script can use daemonize or nohup to achieve this

  • 6 It should provide the function to check the process running status

Since the crawlers are programs(and the number of them is not few), so no wonder some of the crawler gets this or that problem and stop working(say bugs, lack of memory and disk, java vm crushing or kernel crushing).

So, I need to check for their health, so, if some slave’s crawler has died, or 1 slave is restarted, I can know it when I check, or I can write a script to check that, if there is something wrong, it can send me an email about the problem, so I can get a plan to restart them or doing something else.

  • 7 It should provide the function to kill the processes running

This is very useful for redeploying the scripts, or redeploying the data analyse rules for all the crawlers. Say, if I have add another rule to the crawler, I need to publish the rule to all the slave machine(this can be done easily using taktuk), and I need to restart all the crawler processes.

So, for argument 1, 2 and 3, I think bash and perl is the best choice. And the publishing and remote executing can use taktuk to handle, I choose bash as the script language.

Thanks for taktuk’s ability, so I can use the logic for master and manage all the servers, so I just need to redirect the stdout to the master and I can get every detail of the slave’s status.

May be you will ask: Why bother? If you just need a job manager, Why not use Hadoop? Hadoop is very good at executing and manage jobs.

The answer is that hadoop or map reduce is not fit for crawling. Crawling is something like a recursive tool. Crawling start at a beginning point, and found more and more task from that, you don’t know how many times should that recur, but map reduce is not good at recursive operations.

I surely use hadoop, but just to handle the data that crawler has fetched, but as crawler, it is not useful.

At the end of this article, I would say, to run a cluster of crawler is very difficult, the logic of crawler is very complex if you want crawler won’t cost very much of your precious time. I’ll write how I wrote the crawler in another article.

Thanks for taktuk, without it, the work for my scripting tool can be more harder than writing the crawler. Life is hard for analysing massive data, but with a better tool, at least, your life will be easier.

Let’s Talk About JavaScript (Part 1)

Mammon slept. And the beast reborn spread over the earth and its numbers grew legion. And they proclaimed the times and sacrificed crops unto the fire, with the cunning of foxes. And they built a new world in their own image as promised by the sacred words, and spoke of the beast with their children. Mammon awoke, and lo! it was naught but a follower.

from The Book of Mozilla, 11:9 (10th Edition)

To me, the programming languages can mean many things, they are tools to do specific jobs, they are the grammar and words to explain the problem and how to handle the problems, they are the methodologies to understand and explain the world’s reality to logic and entities.

So, to me, the programming language has 3 parts:

1. Methodology: This part is the fundamental side of the language, the philosophy of the language, the core concept of the language. This part explains why this language exists, what’s the original purpose of this language and how it evolved. This is the metaphysics side of the language, but the most important part to understand the language deeply(at least for my opinion).
2. Words and Grammar: This part is the body of the whole language, how to claim a flow, how to claim a data structure.
3. Fundamental Libraries: Although the first 2 part define what the language is, this part(the fundamental libraries) for the language runtime, is very crucial for the language, it even can affect the success of the language

As this article is the first part of the discussion for JavaScript, so, I’ll talk about the methodology of JavaScript first.

As I said, the methodology of the language is more important part for the language, and how the language is designed, why this language should been created should be discuss at the very begining.

First, the name, JavaScript is a trademark of Oracle Corporation(cited as Wikipedia). It is the common name for ECMAScript v3(for today’s talking), so, when I talking about JavaScript in my blog, I was really talking about ECMAScript (and v3 only), just the standard features and standard build in function and objects.

So, what’s an ECMAScript then?

ECMAScript is an object-oriented programming language for performing computations and manipulating computational objects within a host environment.

And what’s the purpose of the ECMAScript, why we should need it?

ECMAScript was originally designed to be a Web scripting language, providing a mechanism to enliven Web pages in browsers and to perform server computation as part of a Web-based client-server architecture.

As we can see in the ECMAScript’s introduction[http://www.ecma-international.org/publications/files/ECMA-ST-ARCH/ECMA-262,%203rd%20edition,%20December%201999.pdf] and about [http://inventors.about.com/od/jstartinventions/a/JavaScript.htm]. JavaScript was originally inspirited by the idea of Java, that you can run computation and show animations in the web browser(web pages are really dull back to that year).

So, in the design of JavaScript, the author even don’t think the most fundamental part of the programming language, the Standard Input and Standard Output should be necessary.

As stated in the ECMAScript’s standard:

ECMAScript as defined here is not intended to be computationally self-sufficient; indeed, there are no provisions in this specification for input of external data or output of computed results. Instead, it is expected that the computational environment of an ECMAScript program will provide not only the objects and other facilities described in this specification but also certain environment-specific host objects, whose description and behavior are beyond the scope of this specification except to indicate that they may provide certain properties that can be accessed and certain functions that can be called from an ECMAScript program.

So, we can have a conclusion here:

JavaScript (the language standard ECMAScript v3) is a web scripting language, it is interpreted by the host environment (run time), and besides the language grammar, other things that needed is provided by the host environment, and these things are depended by the host environment, not the language itself.

Then we can get the real fact of JavaScript, it is a interpreting embed language for running the computation tasks by the host environment.

And why I think JavaScript is a good(in fact very good) interpreting embed language, let’s see JavaScript’s philosophy(I try to make it clear, but I’m not teaching JavaScript here, if you want to know the details of JavaScript programming, you can find it on W3CSchool).

1. JavaScript is a Object oriented language

Everything in JavaScript is an object, the object itself is something like a hash table(so javascript don’t need bother to have the data structure for hash tables). Object can have a function as its constructor, and if you are clever enough, you can make objects has parent object (even parent constructor). Because object are somewhat hash table, you can’t have really private in the object, everything you have in the object can be see by anything that can get hold of the object, and JavaScript even let you to iterate the property names in the object.

Since everything is an object, the primitives and functions are object too, but the primitives(numbers for example) can’t really perform as a hash table, though you can setting properties to them, they won’t save any of the property at all.

This purity offers JavaScript the simplicity, if you don’t agree, see what a mess Java has become.

2. JavaScript is a prototype based language

JavaScript is not the first one prototype based language, but I think it is the most famous one. The prototype based nature of javascript can be more easily implemented, and still have the good ways of object oriented. I’ll talk about prototype based language in other articles, so I’ll skip this part in this one.

3. JavaScript is a weak typed language

JavaScript do have types, it has Undefined, Null, Boolean, String, Number and Object type, they can convert to each other (except object, because we don’t have to, everything is an object, remember?)

But the variable don’t have types, so that, you can set anything to any variable and JavaScript won’t complain anything.

But JavaScript won’t let you to create your type, and there is nothing can help you to create a thing like type(since JavaScript is prototype based), it seems weird though(especially from java’s world), but it is really easy for JavaScript’s implementations (If we have Classes, then we must need packages and other many more things, if you don’t want have a mess environment).

4. JavaScript is an embed language

JavaScript can do anything the host environment want it do. So, if the host environment don’t provide the fundamental part JavaScript wants, suppose a calculator can run JavaScript, maybe javascript is just an object oriented calculation language on that calculator, it just can calculate the numbers and output the texts.

All the nature for JavaScripts only points to one conclusion:

JavaScript is build to be an embed language, and it focus on the simplicity of the language(and the implementation of the language), so that, the language(or the implementation) can be run at most machines, and can be easily implemented and mantained.

And as the simplicity of JavaScript (language design), for many circumstances, JavaScript can be the fastest scripting language on any platform, and the philosophy of JavaScript makes it a very nice embed language to use.

Yes, JavaScript have its flaws, but not on its philosophy, we’ll talk that later.

So, a final word for the conclusion of this article:

JavaScript an Objected Oriented Prototype Based Weak Type Scripting Language, its aim is to be a scripting language of simplicity to run in the host environment, and it has achieve its design goal.

We’ll talk about JavaScript’s grammar in next article.