Duplicates in two columns in a table

thor-in-the-poppies-lisa-twede

That’s a painting of my cat Thor in the backyard.  Apparently he’s looking at the hose, despite the fact that to his right the yard is full of California Poppies.

Today’s blog has some code that is not my own.  Michael Powaga posted it over on Stack Overflow.  I just wanted to keep track of this little gem because I’ve needed it twice now and both times I ended up going with his solution.

I needed to know how many rows (and which ones) were duplicates but not just on one column, on two columns.  I had a situation where (I feel) there should have been a constraint to prevent duplicates in this table.  I wanted to neatly identify the offending records so I could deal with them.

This is his code.  It’s at the link above, but also copied here just in case.

select s.id, t.* 
from [stuff] s
join (
    select name, city, count(*) as qty
    from [stuff]
    group by name, city
    having count(*) > 1
) t on s.name = t.name and s.city = t.city
Advertisements

XML Parser

barn-owl-landing-lisa-twede

References:

Needed to parse an XML import file in a Java application.  I liked what I read about JAXB here, so this is the XML parser I used.  What I liked about it is that I didn’t need to write code to deal with the XML file line-by-line.  With JAXB, you do a little prep work to create a schema for the XML file, and at run-time the JAXB methods just need to know the location of the XML file and the location of the class definitions and it creates your Java objects.  Then you just deal with these objects in your Java code however you please. Not a lot of fuss involved.

Creating the schema

First step is to create a schema from your XML file.  I used a free online schema generator.  I just uploaded my XML file and it created an XSD schema for me.

Creating classes (called binding the schema)

The Java SDK comes with a tool for this.  You don’t need to download or install anything that you don’t already have.  The tool is xjc, and it’s in Java’s bin folder.  You can type xjc -help to learn about all the options.  But the main syntax you need is this:

xjc -d <directory to create Java class files in> -p <package name to use in the class files> <schema created in previous step with a .xsd file extension>

Java code

You need these imports:

import javax.xml.bind.JAXBContext;
import javax.xml.bind.JAXBException;
import javax.xml.bind.Unmarshaller;
import java.io.File;

The code:

 JAXBContext jaxbContext = JAXBContext.newInstance(location of classes you created using the xjc tool);
 Unmarshaller unmarshaller = jaxbContext.createUnmarshaller();
 Object o = unmarshaller.unmarshal(new File(location and name of XML file to parse));

You can cast the output to the object type immediately if you’re only using this code for one type of XML file and you know what type of object it will always generate.  Or leave it as an Object if you want this to handle any XML file, and then later check the type of object with instanceof .

SQL – Find the largest number in a set of alpha-numeric values

SQL – Find the largest number in a set of alpha-numeric values

References:

The SQL:

declare @lastIntUsed int 

create table #temp (clean_number varchar(50), IntCol int) 
; with tally as (select top (100) N=ROW_NUMBER() over (order by @@spid) from sys.all_columns),
data as ( 
   select sometable.number, clean_number
   from sometable
   cross apply ( 
      select (select C + '' FROM (select N, SUBSTRING(sometable.number, N, 1) C  
          from tally
          where N<=DATALENGTH(sometable.number)) [1]
      where C between '0' and '9'
      order by N
      for xml path('')) 
   ) p (clean_number)
   where p.clean_number is not null 
) 
insert into #temp (clean_number, IntCol) 
select number, cast(clean_number as int) IntCol from data

set @lastIntUsed = (select MAX(IntCol) from #temp) 
print @lastIntUsed
drop table #temp

I couldn’t find any way to just get the max number from the CTE itself (doesn’t mean there isn’t a way), so I said screw it – I’ll just stick the results into a temp table and then I can get the max number or the min or the max number that is less than some value or whatever I want.

Faking a connection refusal

flowersandbutterfliesfromfacebooknotorigscan

Your application connects to another server to request a service.  Most of the time this hums along with no problem.  But once in a while the server you are attempting to connect to refuses the connection.  When this happens, the phones ring off the hook.

You would like to catch this scenario before all your users get angry.  So you put some monitoring in place.

To test the monitoring, you need to fake the connection refusal.  This took me longer than I care to admit to figure out, but it turns out the answer is pretty simple.

Change the port number.

So say you have Axis2 software running at the port number you are connecting to.  Keep your connection string as it is, except for the port number.  Change the port to a number that is not running any web service software.

That will result in a java.net.ConnectException: Connection refused error.

 

 

Learning Python – make a URL request

rosetwoandbuds002  Been learning Python lately.  Have done a couple of tutorials and one or two simple programs.  This post will be about a simple example of how to access some real estate information that is available from a company called Trulia.

First, some links to sites that helped me figure this out:

  • Download Python, install, and some reference documentation: https://www.python.org/downloads/
  • Stackoverflow thread about making a URL request in Python: http://stackoverflow.com/questions/17301938/making-a-request-to-a-restful-api-using-python
  • How to make a Trulia RSS feed with your real estate search criteria: https://www.trulia.com/tools/rss/#code_block
  • Some free python IDEs: http://insights.dice.com/2014/09/23/look-5-free-python-editors/
  • I decided to go with Visual Studio for an IDE. I picked the custom installation so I could specify Python as a supported language.  https://www.visualstudio.com/downloads/

The code:

#Just a little POC to retrieve some real estate information from the web
#Python 3
 
import urllib.request #for opening connections to URLs

def main():

  truliaURL = 'https://www.trulia.com/rss2/for_sale/Burbank,CA/'

  request = urllib.request.Request(truliaURL)
  response = urllib.request.urlopen(request)
  responseData = response.read().decode('utf-8')
  print(responseData)

if __name__ == "__main__":
 main()

Debugging BlazeDS Performance

raindropsonrosebud I have a server call that returns a lot of data to the client.  About 80% of the elapsed time for the call is the time after the data leaves the server and before it arrives at the client.

We’re using BlazeDS to communicate between a Flex client and a Java back end.

It is crazy hard to find much information on performance tuning for BlazeDS.  But these two blog posts helped me figure out how to get some statistics about the messaging between server and client – things like elapsed time, size of message, etc.

So basically what you do is, on the server side in your services-config.xml file, in the tag of your tag, you add these two tags:

 <record-message-times>true</record-message-times>
 <record-message-sizes>true</record-message-sizes>

Then on the client side, when you instantiate a new AMFChannel, you add a line like this:

applySettings(customSettings());

And add a new function called customSettings which sets the values for these same properties on the client-side.

 private function customSettings():XML {
 return <channel-definition>
 <properties>
 <record-message-times>true</record-message-times>
 <record-message-sizes>true</record-message-sizes>
 </properties>
 </channel-definition>;
 }

Doing it this way gets around the problem of them being read-only properties.

Now, upon successful completion of service calls, add this code to your result handler:

log.debug(getResource(msg, destination, operationName, elapsedTime)); var performanceDetails:MessagePerformanceUtils = new MessagePerformanceUtils(event.message); log.debug(performanceDetails.prettyPrint());

More information on MessagePerformanceUtils here: http://help.adobe.com/en_US/FlashPlatform/reference/actionscript/3/mx/messaging/messages/MessagePerformanceUtils.html#methodSummary

What this does is get you a bunch of information to help with figuring out why the call is taking so long.

Original message size(B): 1685
Response message size(B): 6987918
Total time (s): 8.402
Network Roundtrip time (s): 3.584
Server processing time (s): 4.818
Server adapter time (s): 3.708
Server non-adapter time (s): 1.11

Don’t leave this in your production code.  You’re trying to improve performance, not make it worse!

Hibernate ProjectionList and ResultTransformer to solve problem of massive queries with endless joins

Hibernate ProjectionList and ResultTransformer to solve problem of massive queries with endless joins

gardenoftwedenprofileHibernate can make query building and entity mapping easy, but if you let it take too much control you can also end up with huge queries that create a big performance drag.

If you know you only need data from a few specific columns of your table or tables, you can use a ProjectionList to target just those columns, and use a ResultTransformer to form the raw results into the sparsely populated entity.

To illustrate the point, let’s take an example of an invoice and its line items.  We have a one-to-many relationship between the invoice table and the line item table.  From the invoice table, we want the invoice_id, invoice_number, invoice_date columns.  The invoice table is linked to a vendor table, and from that table we want the vendor_name column.  From the invoice line items, we want the line_item_number, amount and description columns.

Criteria criteria = session.createCriteria(invoice.class);
criteria.createAlias("vendor", "v");
criteria.createAlias("lineItems", "li");
criteria.setProjection(Projections.projectionList()
    .add(Property.forName("id"))
    .add(Property.forName("invoiceNumber"))
    .add(Property.forName("invoiceDate"))
    .add(Property.forName("v.vendorName"))
    .add(Property.forName("li.lineItemNumber"))
    .add(Property.forName("li.amount"))
    .add(Property.forName("li.description)));

You need to specify aliases for “vendor” and “lineItems” in order to be able to specify the properties from those related entities.  If you had a situation where there wasn’t always a vendor, but you wanted information from invoices that didn’t have vendors, you would specify the alias like this:

criteria.createAlias("vendor", "v", Criteria.LEFT_JOIN);

Since you have a one-to-many relationship from invoice to lineItem, you will get a separate object from this query for every line item.  You will get the same invoice information repeated, with different line item information.  In other words, if you have an invoice that has two line items in it, the raw data returned from the query will look like this:

o = {java.lang.Object[7]}
[0] = 1234   -- the internal ID for the invoice
[1] = "LAP-12355" -- the invoice number
[2] = "12/31/2016" -- the invoice date
[3] = "George's Great Grill" -- the vendor name
[4] = "1" -- the invoice line item number
[5] = 400.23 -- the invoice line item amount
[6] = "ribs" -- the invoice line item description

And then you might have a second object returned with the same information exactly in elements 0 through 3, but with the following for elements 4, 5, and 6:

[4] = "2" -- the invoice line item number
[5] = 20.00 -- the invoice line item amount
[6] = "delivery charge" -- the invoice line item description

This is where the ResultTransformer comes in.  The ResultTransformer is a method that transforms the raw results returned from the SQL query into the entity you use in your Java code.  There are built-in ResultTransformers you can use.  For this example we will write our own, to illustrate how it works.

Specify the ResultTransformer on your criteria like this:

criteria.setResultTransformer(new ResultTransformer()
{
    @Override
    public Object transformTuple(Object[] o, String[] strings)
    {
        return transformObjectToInvoice(o);
    }

    @Override
    public List transformList(List list)
    {
        return consolidateInvoices(list);
    }
});

Then you write a private method transformObjectToInvoice that takes an Object[] and returns an Invoice.  Every invoice it will return will have one line item.  And you write a private method consolidateInvoices that takes a List and returns a List. But the incoming list will have invoices with only one line item, and the outgoing list will have fewer invoices, and the invoices will have 1 to n line items apiece.

So your  transformObjectToInvoice will look something like this:

private Invoice transformObjectToInvoice(Object[] o)
{
    Invoice invoice = new Invoice();
    invoice.setId((Integer) o[0]);
    invoice.setInvoiceNumber((String) o[1]);
    etc.
    return invoice;
}

And your consolidateInvoices will look something like this:

private List consolidateInvoices(List list)
{
    List consolidatedInvoices = new ArrayList<~>();
    Map<Integer, Invoice> invoices = new HashMap<~>();
    For (Invoice oneLineItemInvoice : list)
    {
        Invoice mapInv = invoices.get(oneLineItemInvoice.getId());
        if (mapInv != null)
        {
             mapInv.getLineItems().add(oneLineItemInvoice.getLineItems().iterator().next());
        }
        else
        {
            invoices.put(oneLineItemInvoice.getId(), oneLineItemInvoice);
        }
    }
    consolidatedInvoices.addAll(invoices.values());
    return consolidatedInvoices;
}