chemaxon.jchem.db.Importer enhancement request

User 8688ffe688

16-08-2012 22:21:54

It would be nice to have a reset method in the Importer object that would release the reference to the File object of the setInput method and it would allow me to reuse the Importer object by making subsequent setInput(File) calls within a loop.  See code snippet below.


What I'm doing presently:



ByteArrayOutputStream bOut = new ByteArrayOutputStream();
PrintStream pOut = new PrintStream(bOut);
File[] fileArr = File.listFiles(new FileFilter("sdf"));
for (int i = 0; i < fileArr.length; i++) {
FileInputStream fIn = new FileInputStream(fileArr[i]);

Importer molImp = new Importer();
molImp.setConnectionHandler(ch);
molImp.setHaltOnError(false);
molImp.setLinesToCheck(1024);
molImp.setInfoStream(pOut);
molImp.setTableName(tableName);
molImp.setInput(fIn);
int importCount = molImp.importMols();

pOut.flush();
String details = bOut.toString();
log.info(details);
bOut.reset();
fIn.close(); //release file so we can move

FileUtils.moveFileToDirectory(moveDir,fileArr[i]);
}


What I would like to do:



ByteArrayOutputStream bOut = new ByteArrayOutputStream();
PrintStream pOut = new PrintStream(bOut);
Importer molImp = new Importer();
molImp.setConnectionHandler(ch);
molImp.setHaltOnError(false);
molImp.setLinesToCheck(1024);
molImp.setInfoStream(pOut);
molImp.setTableName(tableName);

File[] fileArr = File.listFiles(new FileFilter("sdf"));
for (int i = 0; i < fileArr.length; i++) {
molImp.setInput(fileArr[i]);
int importCount = molImp.importMols();
if (molImp.isFinished()) {
pOut.flush();
String details = bOut.toString();
log.info(details);
bOut.reset();
molImp.reset(); //new method to release file and reset Importer
FileUtils.moveFileToDirectory(moveDir,fileArr[i]);
}
}



ChemAxon 9c0afc9aaf

18-08-2012 00:45:34

Hi Matt,


 


My colleagues will reply soon - presumably after some internal discussion.


Originally the Importer object was not designed to be reused.


Meanwhile maybe you could let us know the primary reason for this request , e.g. what is the most important benefit for you if this would be implelmented ?


Best regards,


Szilard

User 8688ffe688

20-08-2012 14:54:50

Because many large databases split up their files, importing all the files can be tedious and error prone.  From a code standpoint reducing the number of objects is usually a good thing.

ChemAxon a9ded07333

30-08-2012 12:31:07

Hi Matt,


According to our examinations the overhead of creating a new Importer class is negligible - both in time and memory. Since Importer is really not designed for reusing it, we would offer to use your first solution - the inner reference of Importer to your File object will be released when you cal setInput() next time.


If you have different experiences or other difficulties with this method, please let us know, we would willingly discuss it.


> importing all the files can be tedious and error prone


Could you describe, what solution would suit your needs? If it is longer, we may arrange a teleconference via skype or GoToMeeting.


Best regards,
Tamás

User 8688ffe688

30-08-2012 15:30:25

The first solution works and we are satisfied with that but could you double check, Importer.setInput(File) method and look for unreleased reference to the file.  We get an IOException to the file when attempting to move it.  I suspect that there is some sort of input stream reference to the file that has not been closed.

ChemAxon a9ded07333

30-08-2012 15:43:16

Hi Matt,


Thank you, we will check (we haven't moved the file, just checked reusage).
Could you send us the stacktrace of the IOException?


Best regards,
Tamás