Heterogeneous thoughts on software.

Relay Maven Over Ssh Using Vertx

The following Groovy 2.0 script is a mechanism for providing http access to file inside a network to which you only have ssh access.

Vertx Maven RelaySource
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
@Grab(group='org.vert-x', module='vertx-lang-groovy', version='1.3.1.final')

import org.vertx.groovy.core.Vertx

def sshCommand = 'ssh root@dcmcfarland.com'
def targetUrl = 'http://repo1.maven.org/maven2/'

def vertx = Vertx.newVertx()
def rand = new Random()

println "Starting maven ssh relay for $targetUrl"

vertx.createHttpServer().requestHandler { request ->
  println "Downloading: $targetUrl$request.uri"
  def targetFile = new File("/tmp/" + rand.nextLong())

  ["sh", "-c", "$sshCommand 
        'wget -O - $targetUrl$request.uri' > ${targetFile.absolutePath}"].execute().text

  request.response.sendFile targetFile.absolutePath
  targetFile.delete()
}.listen(8888)

// Don't kill main thread until application is terminated.
synchronized (vertx) {
  vertx.wait();
}

An example use case

  1. Configure your build or maven setting by adding localhost:8888 as a new repository.
  2. Set targetUrl to the base location of your internal maven repository i.e. http://repo1.maven.org/maven2/
  3. Update sshCommand providing your username and server location i.e. ssh root@dcmcfarland
  4. Run: groovy VertxMavenRelay.groovy
  5. To test everything is working access http://repo1.maven.org/maven2/com/google/inject/guice/ open your browser and go to http://localhost:8888/com/google/inject/guice/ the file listings for guice should display. If you now run a maven build maven will be able to access files in the internal repository.

N.B. If ssh-keys have not been configured to provide passwordless ssh access, the terminal in which VertxMavenRelay is running will request your password every time a http request is made.

See you next time.

Shakespeare’s 7 Monkeys: An Initial Adventure With Lucene

This is the first exercise in a tutorial series introducing Lucene, the text search engine library.

Source for the exercises in this series is available on Github and the only prerequisite for running the initial exercises is Groovy 2.0. The texts that will be indexed in the exercises come from Project Gutenberg.

This exercise will illustrate by walking through a Groovy script how simple it is to index a document and in turn search for terms by indexing ‘The complete works of Shakespeare’ and allowing for a single term search to be performed. First we shall use a feature in Groovy called @Grab which will add the required dependencies for Lucene onto the script’s classpath.

Setting up ClasspathSource
1
2
3
4
5
6
7
8
9
10
11
12
13
14
// @Grab is a nice feature for getting dependencies added to the classpath.
@Grab(group='org.apache.lucene', module='lucene-core', version='4.0.0')
@Grab(group='org.apache.lucene', module='lucene-queryparser', version='4.0.0')
@Grab(group='org.apache.lucene', module='lucene-analyzers-common', version='4.0.0')
@Grab(group='org.apache.lucene', module='lucene-queries', version='4.0.0')

import org.apache.lucene.analysis.*
import org.apache.lucene.analysis.standard.*
import org.apache.lucene.document.*
import org.apache.lucene.index.*
import org.apache.lucene.queryparser.flexible.standard.*
import org.apache.lucene.search.*
import org.apache.lucene.store.*
import org.apache.lucene.util.*

Next we shall create an IndexWriter using a standard analyser and a RAMDirectory where the index will be stored for the duration of the script.

Create RAMDirectory IndexSource
1
2
3
4
5
6
7
8
9
10
11
// This script indexes the text from shakespeare.txt and indexes each line. 

// Search for a line containing the first argument if passed,
// otherwise search for lines with monkey.
def searchTerm = this.args.length > 0 ? "line:${this.args[0]}" : "line:monkey"

// Setup required lucene objects for writing to the lucene index.
def indexDirectory = new RAMDirectory();
def analyzer = new StandardAnalyzer(Version.LUCENE_40)
def writerConfiguration = new IndexWriterConfig(Version.LUCENE_40, analyzer)
def indexWriter = new IndexWriter(indexDirectory, writerConfiguration);

We shall then use an anonymous closure to add each line of ‘The complete works of Shakespeare’ into the index along with its associated lineNumber.

Index DocumentSource
1
2
3
4
5
6
7
// Index the shakespeare text file line by line.
new File("shakespeare.txt").readLines().eachWithIndex { line, lineNumber ->
  Document doc = new Document();
  doc.add(new IntField("lineNumber", lineNumber, Field.Store.YES))
  doc.add(new TextField("line", line, Field.Store.YES))
  indexWriter.addDocument(doc)
}

With the indexing of ‘The complete works of Shakespeare’ finished, it is now time to search for lines which contain a term.

Index DocumentSource
1
2
3
4
5
6
7
8
9
10
11
12
// Print out each line which matches the search term, with a return limit of 10000 matches.
def indexReader = indexWriter.getReader()
def query = new StandardQueryParser(analyzer).parse(searchTerm, "")
def indexSearcher = new IndexSearcher(indexReader)
def hits =  indexSearcher.search(query, 10000).scoreDocs

hits.collect{indexSearcher.doc(it.doc)}.each{ println "${it.lineNumber} ${it.line}"}
println "${hits.length} matches for ${searchTerm - 'line:'} found."

// Tidy up resources
indexReader.close()
indexWriter.close()

Finally it is time to run the script and see which lines match our term.

Index DocumentSource
1
2
3
4
5
6
7
8
9
$groovy IndexAndSearchShakespeare.groovy Monkey
74109 for a monkey.
105489 Into baboon and monkey.
79411 On meddling monkey, or on busy ape,
33770 was the very genius of famine; yet lecherous as a monkey, and the
104044 CALIBAN. Thou liest, thou jesting monkey, thou;
12445 an ape, more giddy in my desires than a monkey. I will weep for
68612 LADY MACDUFF. Now, God help thee, poor monkey! But how wilt thou do
7 matches for Monkey found.

Try using wildcards as long as they are not the first character (as this breaks the rules for the Lucene Query Syntax)

Index DocumentSource
1
groovy IndexAndSearchShakespeare.groovy Monk*

See you next time…