Many Apps for NAS systems need to process file data within local file systems or from remote file servers. This "Learn by Doing" sets the problem:
- Scan files (local or remote)
- Store metadata folders and files into database
- Extract metadata and text information from specific file types, such as PDF, Office, images, etc.
|
User manual of fscan.py |
According the User Manual of fscan.py, following codes do the program framework.
|
The empty program framework of fscan.py |
- def main(argv): main program just runs parseArgs()
- def parseArgs(): parse arguments
- def printHelp(): print help message for '-h' arguments
- if __name__ == "__main__": main(sys.argv[1:]): the entry point is main()
addRepository()
Parse URL/URI
First, we have to implement the function addRepository() that checks the added "
host/folder" is correct and stores the information into db. To extend the "
host/folder" definition into URI/URL, search "python parse url" from Google and get the first result page:
20.16. urlparse — Parse URLs into components. I changed the link to Python 2.6.x document. See the sample code is very easy, just copy/paste into Python interpreter for testing.
|
urlparse: return (scheme, netloc, path, params, query, fragment) |
According to the documentation,
urlparse supports URL schemes:
file, ftp, gopher, hdl, http, https, imap, mailto, mms, news, nntp, prospero, rsync, rtsp, rtspu, sftp, shttp, sip, sips, snews, svn, svn+ssh, telnet, wais. In this article, only local/remote file repositories are parsed and added.
- file://localhost/path: e.g. "/share/" denotes the repository file://localhost/share/..
- os.path.isdir(arg): Check if the local directory exists or not.
- os.path.abspath(arg): Get the absolute path of local dir. Then map into: file://localhost/path.
|
addRepository() extracts and parses the path to insertDbResp() as a task |
PostgreSQL supports
trust connection (
for security, set to localhost only), so I
built a database for use "nas" and create tables, views, and functions in the database "
fs". Then test db connection and retrieve db data.
|
sample code for db connection and db query |
|
get the db record |
- pg_hba.conf: "host all all 127.0.0.1/32 trust" or "host all all localhost trust"
- import psycopg2: ssh runs "ipkg install py26-psycopg2" to install the package.
- global conn: Create a global object for db connection to avoid open/close connections frequently.
- conn = psycopg2.connect("host=localhost dbname=fs user=nas"): trust connection with user=nas.
A repository is corresponding to a record of Class table, therefore following metadata are stored in db.
- Name: e.g. share
- Since: create data time, i.e. now().
- LastModifyDT: last modified data time of the repository for checking validation.
- Scheme: e.g. file=11; smb=12; ftp=13, etc.
- Host: localhost
- Path: /path
- URI: a derived attribute, i.e. file://localhost/path
No matter what's the detail of the Class table and relationships between other tables, just create functions or views for your needs. In this case, I create a function InsertClass(parentCID, Name, LastModifyDT, Scheme, Host, Path) to add a class record under "/Home/Repository" for building the Class hierarchy.
To get the metadata (last modified date) of a directory, Google "python directory time" and obtained "
How to get file creation & modification date/times in Python?". I learned a concise coding style of Python:
I borrowed the code to get metadata of a directory and pack arguments of a SQL function. In fact, the "last modified data" of directory is retrieved to mtime.
|
use (a, b, .., ) to get return array |
|
use (a, b, .., ) to pack arguments of function() |
- os.stat() retrieves metadata of a directory or a file into an array assigned to (...).
- time.ctime() transforms datetime (in second mode) into symbolic datetime format.
- sql = (...) packs arguments of SQL function, then transfers sql into string with str(sql).
|
return object to array with a list of variables |
|
array to string for packing arguments |
Finally, just complete the SQL function
InsertClassFS(...), the program can be tested to successfully add a repository into the Class hierarchy.
|
InsertClass: add a class as a subclass of _PCID |
|
InsertClassFS(): reuse InsertClass() to add a repository as a subclass of "Root/Repository" |
Add codes to execute SQL and commit the transaction.
|
Execute SQL and Commit Transaction |
I run SQL codes (line 1 and 2) to update the information of Class table as shown below.
|
Class table before execution |
|
Run the program to add a repository |
|
Class table after execution |
... To be continued!
No comments :
Post a Comment