Luckily, jars have Manifests, which let you specify meta-information like Main-Class and version information. And... a Class-Path for other dependencies on jars and directories. And there's the in.
Let's suppose that you have a hadoop cluster all linked via NFS (or sshfs, or...), and you don't like jars, and you've carefully set up your $CLASSPATH to contain all the classes you'd ever care to use. Then try out this script:
#!/bin/env pythonNow, in your code, when you set up your job:
import os
user=os.environ["USER"]
out = open("~/.super-manifest.txt"%user,"w")
out.write("Class-Path: ")
# Creates a Jar from a CLASSPATH:
classpath=os.environ["CLASSPATH"]
for x in classpath.split(':'):
if x is not '':
if not x.endswith(".jar"):
x = x + "/" # dirs must end with a slash
out.write(" %s \n"% x)
out.close();
os.system("mkdir -p ~/.superlibs/lib");
os.system("jar cmf ~/.super-manifest.txt ~/.superlibs/lib/supererJar.jar");
os.system("jar cf ~/.superlibs/superJar.jar -C ~/.superlibs/ lib/supererJar.jar");
jobConf.setJar("/home/YOURNAME/.superlibs/superJar.jar");And give it a spin. Hadoop, with no jars!
No comments:
Post a Comment